Download Level 400 SQL Server Webcast 3, FREE, for limited time, download ASAP from webcasts.sqlworkshops.com


You can NOW download the FULL Level 400 SQL Server Webcasts 3 from http://www.sqlworkshops.com/webcasts, FREE, no registration necessary.


Webcast 3 Part 1: http://sqlworkshops.box.net/shared/static/gu6ysh15cc.wmv


Webcast 3 Part 2: http://sqlworkshops.box.net/shared/static/690b473f4j.wmv


Webcast 3 Part 3: http://sqlworkshops.box.net/shared/static/g4t0o05nu5.wmv


Pass on the link to your SQL Server friends and colleagues.


R Meyyappan

ww.sqlworkshops.com

Posted by sqlworkshops | with no comments

FREE Level 400 Webcast 3 on Tuning Parallelism (CXPACKET, SLEEP_TASK), Hash Match (SLEEP_TASK), Sort (IO_COMPLETION), Prefetch (PAGEIOLATCH_SH)

Download (FREE!) Truly Level 400 SQL Server Performance Monitoring and Tuning Webcast 3 for developers and database administrators from http://webcasts.sqlworkshops.com. Preview, full webcast will be available for download coming Wednesday around noon PST. Please forward the link to your friends and local user community.

Webcast 3: Recommends not using stored procedure or other plan caching mechanism like using sp_executesql and Prepared Statement using ADO.NET or OLEDB based executions for memory allocating queries. Common memory allocating queries are that perform Sort and do Hash Match operations like Hash Join or Hash Aggregation or Hash Union.

With examples provides ways to identify queries performing Hash match operations that spill to tempdb. Using SQL Profiler: Hash Warnings (Hash Recursion and Hash Bailout). Using sys.dm_exec_query_memory_grants: Granted Memory, Used Memory and Maximum Used memory. Explains how SLEEP_TASK wait type is associated with Sort Warnings.

Explains with examples how sp_recompile can block and bring an application to a standstill and recommends using DBCC FREEPROCCACHE (plan_handle) instead.

Explains with examples how stored procedure or other plan caching mechanism affects queries that sort (perform order by). Explains how IO_COMPLETION wait type is associated with Sort Warnings.

Demonstrates why rollbacks and database restore could wait on IO_COMPLETION wait type and if needed how this wait can be reduced.

Explains how parallel query performance is significantly affected by a CPU intensive query executing on one of the CPU cores. Explains the reason for the performance issue and how to identify the issue. With example explains the reason for the observed query execution time when the child thread executes on the CPU core where the CPU intensive query executes and also when the coordinator executes on the CPU core where the CPU intensive query executes. Recommends ways to avoid this parallel query performance issue and also demonstrates that only certain queries will be affected by this and not all parallel queries.
Explains sys.dm_os_waits_stats.signal_wait_time_ms does not indicate system wide CPU pressure, just at a CPU core level.

Explains the prefetch mechanism and how it can affect query performance. Provides ways to force prefecth. Demonstrates cases where Avg. Disk Sec / read, the disk latency, PAGEIOLATCH_SH, Avg. disk queue length is very high and the query executes fast.

Explains why Plan Guides (Plan Freezing!) cannot help with prefetch or in few other cases.

Webcast 2: Explains high CXPACKET waits are NOT a direct result of delays associated with inefficiencies of parallel processing. Provides example to scale queries over many CPU cores without reducing MAXDOP settings.

Webcast 1: Explains Memory allocation issues with sort. Demonstrates ways to identify sort spills to tempdb. Provides query rewrite procedure to avoid sort spills to tempdb. Demonstrates cases where 1 tempdb date file per core might not be ideal for all implementations.

Regards

R Meyyappan
http://www.sqlworkshops.com

Posted by sqlworkshops | with no comments

Download Now Free Truly Level 400 SQL Server Performance Tuning Webcast 2 at SQLWorkshops.com

Thanks for everyone who attended my level 400 SQLBits session and viewed Webcast 1, due to great feedback here is Webcast 2. This level 400 SQL Server Tuning webcast covers optimizing queries to scale across many CPU cores (by addressing inefficiencies in parallel query execution without reducing MAXDOP due to high CXPACKET waits). Liked Webcast 1 ? you will like Webcast 2 even more!!

http://webcasts.sqlworkshops.com

Previous webcast covered optimizing queries to avoid sort spills over tempdb (by addressing memory grant issues). Next webcast will cover optimizing queries to efficiently utilize Disk subsystem (planned release date February 10, 2010).


Feedback from a Microsoft Employee on Webcast 1: "Your webcast rocked. Your free webcast was very good and informative".
 

In case you made a resolution this New Year to gain deep SQL Server knowledge, here is an opportunity knocking at your door, don’t miss it, it is all FREE, it costs you nothing. 

These webcasts are equally designed for developers and database administrators. We believe with deep SQL Server knowledge customers can reduce Total Cost of Ownership (TCO) without investing in additional hardware or software, by simply optimizing their existing investment.


Like before, these webcasts are full of real world examples and not a power point slide bullet point based best practice guideline lectures. These webcasts are a result of practical performance tuning for real world customer issues. We are sure you will find them useful in your day to day work.


Hope you enjoy our freely downloadable webcasts; I am looking forward to your feedback
 

Regards

-ramesh-
http://www.sqlworkshops.com/
http://www.sqlbits.com/Speakers/R_Meyyappan/Default.aspx
Posted by sqlworkshops | with no comments
Filed under: ,

For those who couldn't attend my session - Let’s make SQL fly @ SQLBits V

Here is the link to the content in the webcast format

http://webcasts.sqlworkshops.com

Regards

-ramesh-

Posted by sqlworkshops | with no comments
Filed under:

dm_exec_query_stats reports wrong CPU Utilization when a query executes in parallel(MAXDOP > 1)

Introduction:
This article helps you to understand the fact that sys.dm_exec_query_stats returns incorrect CPU utilization when a query executes in parallel.

The purpose of this article is to help you better understand SQL Server 2005 Performance Monitoring and Tuning. The purpose of this article is not to discuss whether this is a bug or not.

Description:
When a query executes serially(MAXDOP  = 1), the total_worker_time reported in sys.dm_exec_query_stats is accurate. But when a query executes in parallel(MAXDOP > 1), total_worker_time reported by sys.dm_exec_query_stats is inaccurate.

Usually CPU intensive query execute in parallel. Most customer use default configuration where 'max degree of parallelism' is set to '0' where it is more common for CPU intensive queries to execute in parallel.

When a customer tell you they has high CPU utilization on their server and ask you to identify the issue. Without knowing sys.dm_exec_query_stats reports incorrect CPU utilization when a query executes in parallel, you might query sys.dm_exec_query_stats and tell your customer that there is no query that is CPU intensive. Sooner or later the customer might find out the query that you missed to point out.

Example:
This example needs tables that you have to create. The script to create the table is located at the end of this article(www.sqlworkshops.com/dm_exec_query_stats.htm). You also need a server that has minimum 2 CPUs to reproduce this example.

When you execute the below query limiting MAXDOP to 1(seriel execution), the CPU_Utilized_in_Seconds reported by sys.dm_exec_query_stats is accurate - will match your wall clock execution time.

>> in theory sys.dm_exec_qyery_stats always works:
--example provided by www.sqlworkshops.com
--reset cache to collect fresh set of statistics
dbcc freeproccache
go
--execute a sample workload serially that takes x amount of seconds
select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2 option (maxdop 1)
go
--now query sys.dm_exec_query_stats to find CPU Utilized by the above query
select (total_worker_time * 1.0) / 1000000 as CPU_Utilized_in_Seconds, * from sys.dm_exec_query_stats
cross apply sys.dm_exec_sql_text(sql_handle)
where text like '%select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2%' and
text not like '%sys.dm_exec_query_stats%' --to eliminate our probe
go
>> CPU_Utilized_in_Seconds will be around 6 to 18 seconds based on your CPU speed - which is what you expect

But when you execute the query without limiting MAXDOP to 1, say 0(parallel execution), the CPU_Utilized_in_Seconds reported by sys.dm_exec_query_stats is inaccurate - will not match your wall clock execution time.

>> in practice sys.dm_exec_qyery_stats does not always works:
--example provided by www.sqlworkshops.com
--reset cache to collect fresh set of statistics
dbcc freeproccache
go
--execute a sample workload parallely that takes x amount of seconds
select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2
go
--now query sys.dm_exec_query_stats to find CPU Utilized by the above query
select (total_worker_time * 1.0) / 1000000 as CPU_Utilized_in_Seconds, * from sys.dm_exec_query_stats
cross apply sys.dm_exec_sql_text(sql_handle)
where text like '%select max(t1.c2 + t2.c2) from tab7 t1 cross join tab7 t2%' and
text not like '%sys.dm_exec_query_stats%' --to eliminate our probe
go
>> CPU_Utilized_in_Seconds will be around 0.00xxxx seconds  - which you do not expect!

 

The full article is located at www.sqlworkshops.com/dm_exec_query_stats.htm

 

Now you see the theoretical explanation and practical usage!!


 

sqlworkshops / www.sqlworkshops.com

 

Finding optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload

Introduction:
This article is applicable for finding optimal number of CPUs for long running CPU intensive queries/workload that doesn’t frequently wait for other resources (typical DSS/OLAP workload).

This article is not applicable if your queries/workload is often waiting for resources(like I/Os, Locks, Latches etc.) without consuming CPU in a stretch(typical OLTP workload).

This article can also provide information on uneven CPU load across NUMA nodes and uneven CPU load within same NUMA node (load_factor effect).

It is recommended to analyze Windows Performance Monitor Counters for monitoring CPU pressure. Processor utilization greater then 75% to 80% indicates CPU pressure. Using Windows Performance Monitor should be the 1st step, the procedure suggested in this article should be considered as an additional step.

It is very important to find ways to optimize the queries/workload by tuning the database schema before attempting to add additional CPUs.

Description:
When a customer asks you: I am running a DSS/OLAP like SQL job and it takes x amount of time, how can I reduce the time so the SQL job completes sooner, can I add more CPUs ? if yes, how many ? – this article will help you answer their question.

When you see CPU pressure, there are 2 options: you can either upgrade to faster CPUs or add additional CPUs. Upgrading to faster CPU will always help. Adding additional CPUs may not always help the SQL job to run faster unless that SQL job can take advantage of additional CPUs. If the customer already has the fastest CPUs available in the market then they have to wait for the next release of faster CPUs. More practical way might be to add additional CPUs if it helps, the below procedure will help you identify if this is the case.

This method calculates total user waits for CPU during the SQL workload and suggests additional CPUs if necessary. If CPU usage is at 100%, but no one waited for CPU during the workload, then adding additional CPU will not help; this is the basics of this calculation.

Current recommendations that are available on this topic calculates ‘signal wait time’ to ‘wait time’ ratio to suggest CPU pressure – but this cannot help one easily identify number of additional CPUs necessary.

Procedure:
When concurrent users apply simultaneous DSS/OLAP like CPU intensive workload, there could be CPU pressure. We can conclude presence of CPU pressure when at any given moment during this time period at least one or more user tasks waited for CPU resource.

In this case one can run the below query to find out how many CPU on an average will help to scale the workload better. It might be more informative to collect the below information in short time intervals (many samples) than just once to understand during which time of the workload application the CPU pressure was the most. Single sample will lead to average additional CPUs necessary for the entire workload duration.

1. Reset Wait Stats
dbcc sqlperf('sys.dm_os_wait_stats', clear)

2. Apply workload (you can find sample workload query at the end of this article, you need to execute the sample workload query simultaneously in many sessions to simulate concurrent user tasks).

3. Run the below query to find Additional CPUs Necessary – it is important to run the query right after the workload completes to get reliable information.
select round(((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count), 2) as Additional_CPUs_Necessary, round((((convert(float, ws.wait_time_ms) / ws.waiting_tasks_count) / (convert(float, si.os_quantum) / si.cpu_ticks_in_ms) * cpu_count) / hyperthread_ratio), 2) as Additional_Sockets_Necessary from sys.dm_os_wait_stats ws cross apply sys.dm_os_sys_info si where ws.wait_type = 'SOS_SCHEDULER_YIELD'

About the article:
This is an excerpt from the 3 Day Level 400 Microsoft SQL Server 2005 Performance Monitoring and Tuning Workshop; I explain the concepts in the workshop with examples. I try to do the best while writing this article, but it is not the same!!

This articles discusses a way to find optimal number of CPUs for a given long running CPU intensive DSS/OLAP like queries/workload by measuring CPU pressure. The articles applies to a specific kind of workload, apply the concepts carefully.

If you find this article helpful, give me feedback, this motivates me to write some more articles. I also encourage you attending the 3 Day Level 400 Microsoft SQL Server 2005 Performance Monitoring and Tuning Workshop, the content is very unique & authentic, in addition it is useful, deep and I provide very accurate information with extraordinary attention to detail and explain in depth the internals.

 The complete article is located at:

 CPU Pressure

www.sqlworkshops.com