November 2007 - Posts

Finding optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload

Introduction:
This article is applicable for finding optimal number of CPUs for long running CPU intensive queries/workload that doesn’t frequently wait for other resources (typical DSS/OLAP workload).

This article is not applicable if your queries/workload is often waiting for resources(like I/Os, Locks, Latches etc.) without consuming CPU in a stretch(typical OLTP workload).

This article can also provide information on uneven CPU load across NUMA nodes and uneven CPU load within same NUMA node (load_factor effect).

It is recommended to analyze Windows Performance Monitor Counters for monitoring CPU pressure. Processor utilization greater then 75% to 80% indicates CPU pressure. Using Windows Performance Monitor should be the 1st step, the procedure suggested in this article should be considered as an additional step.

It is very important to find ways to optimize the queries/workload by tuning the database schema before attempting to add additional CPUs.

Description:
When a customer asks you: I am running a DSS/OLAP like SQL job and it takes x amount of time, how can I reduce the time so the SQL job completes sooner, can I add more CPUs ? if yes, how many ? – this article will help you answer their question.

When you see CPU pressure, there are 2 options: you can either upgrade to faster CPUs or add additional CPUs. Upgrading to faster CPU will always help. Adding additional CPUs may not always help the SQL job to run faster unless that SQL job can take advantage of additional CPUs. If the customer already has the fastest CPUs available in the market then they have to wait for the next release of faster CPUs. More practical way might be to add additional CPUs if it helps, the below procedure will help you identify if this is the case.

This method calculates total user waits for CPU during the SQL workload and suggests additional CPUs if necessary. If CPU usage is at 100%, but no one waited for CPU during the workload, then adding additional CPU will not help; this is the basics of this calculation.

Current recommendations that are available on this topic calculates ‘signal wait time’ to ‘wait time’ ratio to suggest CPU pressure – but this cannot help one easily identify number of additional CPUs necessary.

Procedure:
When concurrent users apply simultaneous DSS/OLAP like CPU intensive workload, there could be CPU pressure. We can conclude presence of CPU pressure when at any given moment during this time period at least one or more user tasks waited for CPU resource.

In this case one can run the below query to find out how many CPU on an average will help to scale the workload better. It might be more informative to collect the below information in short time intervals (many samples) than just once to understand during which time of the workload application the CPU pressure was the most. Single sample will lead to average additional CPUs necessary for the entire workload duration.

1. Reset Wait Stats
dbcc sqlperf('sys.dm_os_wait_stats', clear)

2. Apply workload (you can find sample workload query at the end of this article, you need to execute the sample workload query simultaneously in many sessions to simulate concurrent user tasks).

3. Run the below query to find Additional CPUs Necessary – it is important to run the query right after the workload completes to get reliable information.
select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count), 2) as Additional_CPUs_Necessary, round((((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count) / hyperthread_ratio), 2) as Additional_Sockets_Necessary from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'

About the article:
This is an excerpt from the 3 Day Level 400 Microsoft SQL Server 2005 Performance Monitoring and Tuning Workshop; I explain the concepts in the workshop with examples. I try to do the best while writing this article, but it is not the same!!

This articles discusses a way to find optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload by measuring CPU pressure. The articles applies to a specific kind of workload, apply the concepts carefully.

If you find this article helpful, give me feedback, this motivates me to write some more articles. I also encourage you attending the 3 Day Level 400 Microsoft SQL Server 2005 Performance Monitoring and Tuning Workshop, the content is very unique & authentic, in addition it is useful, deep and I provide very accurate information with extraordinary attention to detail and explain in depth the internals.

 The complete article is located at:

 CPU Pressure

www.sqlworkshops.com