November 2007 - Posts

SQL Server: resource waits and signal waits
14 November 07 04:24 PM | Christian | 2 comment(s)

You won't find any resource waits that specifically measure CPU wait time, instead you measure “signal waits” which have been conveniently worked out for you in SQL Server 2005 and exposed in the sys.dm_os_wait_stats dmv.  Specifically, a signal wait is the time after a resource wait when the task has the resource it wanted and is just waiting to run on the CPU again.

For example, spid 55 starts a task that is now “running” on the CPU.  The task now needs to get some data so to keep an efficiently running system it will yield the processor to another task and wait in a “suspended” queue until it gets the data.  Time spent in this queue is what we know as a SQL Server “wait” and in the example the data is on disk so spid55 shows as waiting on a PAGEIOLATCH wait.  This wait represents a latch on a memory address to support a physical IO operation to get data for the address and is a measure of physical IO performance.

Now the task has the data it needs and wants some time on the CPU again.  The only problem is that other tasks want CPU time now as well and he has to join a queue called “runnable” while he waits for his turn.  The time he spends here is known as a signal wait.

The most accurate dmv for tracking SQL Server waits is sys.dm_os_waiting_tasks because it’s based at a task level and not a session level.  sys.dm_exec_requests does have wait information but because its at the session level you won’t see separate waits for parallel tasks or system tasks that can run without a session and therefore you might miss some key data.

sys.dm_os_waiting_tasks is great but the data is transient so it doesn’t keep any history.  To save you the bother of periodically writing this information somewhere else to correlate there is a dmv that already does it for you: sys.dm_os_wait_stats.  This will show you the combined wait times against each wait type of all tasks that have run since the last service restart.  It also exposes the signal wait time for each wait type so totalling those figures will give you a measurement of pure CPU waits within the instance.

Tom Davidson wrote a nice little script which shows the percentage of signal waits vs. resource waits on an instance:

Select signalWaitTimeMs=sum(signal_wait_time_ms)
    ,'%signal waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
    ,resourceWaitTimeMs=sum(wait_time_ms - signal_wait_time_ms)
    ,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) as numeric(20,2))
from sys.dm_os_wait_stats

Its best run against data aggregated specifically within a busy period otherwise long periods of low activity will skew the results.  You could restart the SQL Server service to clear the historical data before the busy period but far more conveniently, you can manually flush the data by running DBCC sqlperf (‘sys.dm_os_wait_stats’, clear).  This is very handy  to get a good view of aggregated waits within a set period but if you’re doing it on someone else’s server make sure they don’t have monitoring tools that depend on the historical data before you flush it!



Christian Bolton
Database Architect - The SQL Server Experts