December 2007 - Posts

dm_exec_query_stats reports wrong CPU Utilization when a query executes in parallel(MAXDOP > 1)

Introduction:
This article helps you to understand the fact that sys.dm_exec_query_stats returns incorrect CPU utilization when a query executes in parallel.

The purpose of this article is to help you better understand SQL Server 2005 Performance Monitoring and Tuning. The purpose of this article is not to discuss whether this is a bug or not.

Description:
When a query executes serially(MAXDOP  = 1), the total_worker_time reported in sys.dm_exec_query_stats is accurate. But when a query executes in parallel(MAXDOP > 1), total_worker_time reported by sys.dm_exec_query_stats is inaccurate.

Usually CPU intensive query execute in parallel. Most customer use default configuration where 'max degree of parallelism' is set to '0' where it is more common for CPU intensive queries to execute in parallel.

When a customer tell you they has high CPU utilization on their server and ask you to identify the issue. Without knowing sys.dm_exec_query_stats reports incorrect CPU utilization when a query executes in parallel, you might query sys.dm_exec_query_stats and tell your customer that there is no query that is CPU intensive. Sooner or later the customer might find out the query that you missed to point out.

Example:
This example needs tables that you have to create. The script to create the table is located at the end of this article(www.sqlworkshops.com/dm_exec_query_stats.htm). You also need a server that has minimum 2 CPUs to reproduce this example.

When you execute the below query limiting MAXDOP to 1(seriel execution), the CPU_Utilized_in_Seconds reported by sys.dm_exec_query_stats is accurate - will match your wall clock execution time.

>> in theory sys.dm_exec_qyery_stats always works:
--example provided by www.sqlworkshops.com
--reset cache to collect fresh set of statistics
dbcc freeproccache
go
--execute a sample workload serially that takes x amount of seconds
select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2 option (maxdop 1)
go
--now query sys.dm_exec_query_stats to find CPU Utilized by the above query
select (total_worker_time * 1.0) / 1000000 as CPU_Utilized_in_Seconds, * from sys.dm_exec_query_stats
cross apply sys.dm_exec_sql_text(sql_handle)
where text like '%select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2%' and
text not like '%sys.dm_exec_query_stats%' --to eliminate our probe
go
>> CPU_Utilized_in_Seconds will be around 6 to 18 seconds based on your CPU speed - which is what you expect

But when you execute the query without limiting MAXDOP to 1, say 0(parallel execution), the CPU_Utilized_in_Seconds reported by sys.dm_exec_query_stats is inaccurate - will not match your wall clock execution time.

>> in practice sys.dm_exec_qyery_stats does not always works:
--example provided by www.sqlworkshops.com
--reset cache to collect fresh set of statistics
dbcc freeproccache
go
--execute a sample workload parallely that takes x amount of seconds
select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2
go
--now query sys.dm_exec_query_stats to find CPU Utilized by the above query
select (total_worker_time * 1.0) / 1000000 as CPU_Utilized_in_Seconds, * from sys.dm_exec_query_stats
cross apply sys.dm_exec_sql_text(sql_handle)
where text like '%select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2%' and
text not like '%sys.dm_exec_query_stats%' --to eliminate our probe
go
>> CPU_Utilized_in_Seconds will be around 0.00xxxx seconds  - which you do not expect!

 

The full article is located at www.sqlworkshops.com/dm_exec_query_stats.htm

 

Now you see the theoretical explanation and practical usage!!


 

sqlworkshops / www.sqlworkshops.com