May 2011 - Posts

Scheduled Jobs on strike ??

I had an interesting situation with agent jobs stopping running without failure.
I don't want to go into attempting to describe our infrastructure but siffice it say we had a few issues.
None of my SQL Servers went down but there were problems and when the system picked up I had some interesting situations on a number of SQL 2008 and SQL 2005 Servers.
Essentially jobs which ran at short time intervals were just not running whilst jobs with time intervals of over an hour were all running fine.
As you might guess t-log backups tend to run at short time intervals but i also have a number of monitoring jobs which collect data every 5 minutes, these store data locally on the servers involved.
If we consider a t-log backup job as an example:-

  • This had stopped some hours previously, but without an error.
  • Other jobs were still running however.
  • The Server had not failed over
  • The Server was still running.
  • Users were happily back in the application and accessing the server.
  • I have a query, to be further examined, which will list a scheduled job which has not run when it should, this listed nothing!
  • The jobs were not disabled
  • The schedules were not disabled
  • I ran the job manually, fine, but the schedule did not pick up.
  • In the end I took the agent service offline and then brought it back online.
  • In every case, cluster or standalone, this resolved the problem.

I'm at a bit of a loss to fully understand why this happened this way, as I say it seemed that those jobs scheduled in minutes all stopped.
Most other nightly jobs ran, but not all of them and not to any particular pattern that I can see.
It is possible that maybe the time server put out some supect times, a few other servers had some strange times for a while I believe.

The only observation I have is that it is wise to check everything when there is a problem and not just assume because the Agent Service is running without errors that all is well!

Posted by GrumpyOldDBA with 2 comment(s)
Filed under:

Windows 2008 R2 Bare Metal Restore

I don't know how many DBAs work with the o/s but I work at o/s level a lot and currently build our dev servers, a collection of IBM x346, HP DL580s and DL585s. I'm currently working through rebuilding them with windows 2008 R2 and I've been making a system backup after the install so if the box dies I can put it back as it was, however it seems I was just one checkbox away from a bare metal restore backup ( don't have to re-install o/s to restore ) so I'm grateful for Core team for this post

http://blogs.technet.com/b/askcore/archive/2011/05/12/bare-metal-restore.aspx

Windows 2008 R2 has some cool features, one of my other most useful bits is the ability to resize a disk partition on-line, and yes it works a dream as I've tried it out!

DMVs – What they can do for you

Users :

· How you deal with user activity on a server may depend upon your application(s), there’s always been sp_who and sp_who2 although I rarely ever use them.

· So why don’t I use these handy procedures and do I have my very own sp_who3 for example? Well the short answer is no because I tend to have worked with web facing applications, this doesn’t mean these are public applications but corporate applications that use a typical n-tier architecture.

· For this reason I’ve rarely been interested in who is connected to my database but more a case of what is connected.

· I’m also usually concerned with performance, monitoring and trending in an attempt to be pro-active rather than re-active, to this end I still have my eye on real time data mining as a way to predict potential issues with the production platform(s).

· So if you have 1,000 users and they all connect as “sa” or “AppUser” and/or you have a number of databases on your instance then sp_who and derivitives are probably not going to be too much use to you.

· I mentioned being pro-active and by this I mean I have processes to alert me if I think there’s going to be a problem, these are based upon thresholds of “normal” use and I have a dashboard which updates every 10 seconds to give me a visual heads up.

· Anyway, these are some of the queries concerned with user connections that I have to hand. I store useful scripts in a common location on all servers and my workstations, mostly I work in rdp sessions on the servers themselves, the reasons for this are out of scope here.

First up there’s a query which relies upon sys.sysprocesses, but if you’re still working with SQL Server 2000 this query will work if you change the table name from sys.sysprocesses to dbo.sysprocesses ( I don’t have a copy of SQL 2000 to test this on so apologies if I am incorrect )

--
select Count(*) from master.sys.sysprocesses where status='runnable' and dbid=7  and spid>50 ;
go
select right(convert(varchar,last_batch),7) as [Time],rtrim(loginame) as [LogonName],spid,open_tran as [Tran],kpid,blocked as [blocker],waittype,waittime
,lastwaittype,rtrim(waitresource) as [Resource],db_name(dbid) as DB,cpu,physical_io,[memusage],rtrim(status) as [Status]
,hostname,program_name,cmd,left(nt_username,20) as [NTuser] from master.sys.sysprocesses with (nolock) where 
--dbid=7 and 
spid > 50
order by --status asc, 
blocked desc,waittime desc,[tran] desc
go

Why is a query better than a procedure? Because I can customise it to whatever situation I’m looking at, I can also change column order and add/remove columns depending upon my requirements.

My first query ( above ) just returns the number of active processes, in certain circumstances spid 50 can actually be a system process so you may wish to change this to spid 51, you may of course wish to include system processes anyway.

The following queries use dmvs only:

--
-- get what's running
--
select 
dess.last_request_start_time as 'Last Batch Start', 
dess.last_request_end_time as 'Last Batch End',
dess.login_name, 
dess.session_id as 'SPID',
der.open_transaction_count as 'Open Tran',
der.blocking_session_id as 'Blocker',
isnull(der.wait_type,'') as 'Current Wait',
der.wait_time as 'Wait Time',
isnull(der.last_wait_type,'') as 'Last Wait',
isnull(der.wait_resource,'') as 'Waiting ON',
ISNULL(db_name(der.database_id),'') as 'Database',
dess.cpu_time as 'CPU Time',
dess.logical_reads as 'Reads',
dess.writes as 'Writes',
dess.[status],
dess.[host_name],
dess.[program_name],
der.command as 'Command',
eqp.query_plan
from sys.dm_exec_sessions dess 
join sys.dm_exec_connections decc on decc.session_id = dess.session_id
join sys.dm_exec_requests der on dess.session_id = der.session_id
CROSS APPLY sys.dm_exec_query_plan(der.plan_handle) eqp
where dess.session_id>50
order by Blocker desc;

Like the previous example you can filter or sort for specifics, again I’m usually troubleshooting here. This query ( above ) eliminates the clutter of non running connections and also presents the option of examining the query plan of what is running.

This query also works on SQL 2005 but you will need to save the xml from the queryplan with a file extension of .sqlplan then open it in SSMS to see the graphical plan, in SQL 2008 you see the graphical plan.

--
-- get what's connected
--
select 
dess.last_request_start_time as 'Last Batch Start', 
dess.last_request_end_time as 'Last Batch End',
dess.login_name, 
dess.session_id as 'SPID',
der.open_transaction_count as 'Open Tran',
der.blocking_session_id as 'Blocker',
isnull(der.wait_type,'') as 'Current Wait',
der.wait_time as 'Wait Time',
isnull(der.last_wait_type,'') as 'Last Wait',
isnull(der.wait_resource,'') as 'Waiting ON',
ISNULL(db_name(der.database_id),'') as 'Database',
dess.cpu_time as 'CPU Time',
dess.logical_reads as 'Reads',
dess.writes as 'Writes',
dess.[status],
dess.[host_name],
dess.[program_name],
der.command as 'Command',
eqp.query_plan
from sys.dm_exec_sessions dess 
join sys.dm_exec_connections decc on decc.session_id = dess.session_id
left outer join sys.dm_exec_requests der on dess.session_id = der.session_id
OUTER APPLY sys.dm_exec_query_plan(der.plan_handle) eqp
where dess.session_id>50
order by last_request_start_time desc;

This query ( above ) is similar to the sysprocesses query and lists all user connections.

Also works on SQL 2005 but with the same conditions for viewing the query plans.

--
-- to get the last command run you need to pass the handle from dm_exec_connections to sys.dm_exec_sql_text
-- including the sql text can make for some big result sets
--
select 
dess.last_request_start_time as 'Last Batch Start', 
dess.last_request_end_time as 'Last Batch End',
est.[text],
eqp.query_plan,
dess.login_name, 
dess.session_id as 'SPID',
der.open_transaction_count as 'Open Tran',
der.blocking_session_id as 'Blocker',
isnull(der.wait_type,'') as 'Current Wait',
der.wait_time as 'Wait Time',
isnull(der.last_wait_type,'') as 'Last Wait',
isnull(der.wait_resource,'') as 'Waiting ON',
ISNULL(db_name(der.database_id),'') as 'Database',
dess.cpu_time as 'CPU Time',
dess.logical_reads as 'Reads',
dess.writes as 'Writes',
dess.[status],
dess.[host_name],
dess.[program_name],
der.command as 'Command'
from sys.dm_exec_sessions dess 
join sys.dm_exec_connections decc on decc.session_id = dess.session_id
join sys.dm_exec_requests der on dess.session_id = der.session_id
CROSS APPLY sys.dm_exec_query_plan(der.plan_handle) eqp
CROSS APPLY sys.dm_exec_sql_text(decc.most_recent_sql_handle) est
where dess.session_id>50
order by Blocker desc;

This query ( above ) lists out the last sql run along with the query plan, I use this mostly when there are blocking issues or I’m concerned something isn’t quite right, such as a serialised procedure plan.

Works with SQL 2005 as before.


--
-- connections by host
--
select  isnull(ses.[host_name],'TOTAL') as HostName,count(*) as Logins
from  master.sys.dm_exec_sessions ses with (nolock)
where ses.session_id > 50 
and ses.[host_name] not in('') -- you might want to add the actual server to eliminate system sessions
group by ses.[host_name]
with rollup;

A slightly different query ( above ) I use this within a dashboard and to monitor connections.

Typically you may have a number of n-tier servers, for instance a number of web servers , SSRS servers and application servers this query groups by host and uses the rollup to give a grand total.

I’ve been known to run this every minute as a scheduled job to put the output into a table, this allows me to then plot the use by host server; e.g. if you have four load balanced application servers did they maintain a similar number of connections all day?

The “ not in “ allows you to filter out say the cluster node; on the other hand you can quickly identify any workstations connecting to your server!

I’ve used this technique previously linked to an alert when I encountered issues with connection pooling not working correctly from a server; the number of connections kept increasing and essentially performed a denial of service attack on the server, by monitoring the count of connections I could alert when a threshold was reached.

If you saw my presentation on creating real time dashboards for production systems this was part of one of the dashboard reports.

--
-- open transactions
--
select count(*) as [Trans]
from sys.dm_tran_session_transactions tst
join sys.dm_tran_active_transactions tat on tat.transaction_id = tst.transaction_id
join sys.dm_tran_database_transactions tdt on tst.transaction_id = tdt.transaction_id
join sys.dm_exec_sessions dess on dess.session_id = tst.session_id
where tst.session_id>50;

The query above counts open transactions; too many open transactions can be an indication of an issue, I use this for monitoring against SSRS, these have users too but perhaps not in the same way as more conventional application databases.

Another proactive query, my experience has shown that for my SSRS Servers a value of 10 or more shows I have slow running reports, over 30 usually indicates timeouts and other user unfriendly effects.

© www.grumpyolddba.co.uk May 2011

Posted by GrumpyOldDBA with no comments
Filed under: