08 May 2008 21:48
leo.pasta
Which queries are missing indexes?
One of the things I really enjoy when doing performance tuning on 2005 (I still work on a mix of several SQL 2000 and some SQL 2005) is the sys.dm_db_missing_index* DMVs. As the query processor evaluates queries, it detects if that specific query could benefit from an index and how much it expect that index would reduce the cost (in terms of IO), exposing these information as views that we can query. I won't delve in the structure of it, but you can use my procedure sp_dba_missingindex as an example (based on Bart Duncan's work). It will return the suggested CREATE INDEX statements along with the expected improvement information, like this:
USE AdventureWorks
exec sp_DBA_MissingIndexes
GO
create_index_statement
improvement_measure unique_compiles user_seeks user_scans avg_user_impact avg_system_impact last_user_seek last_user_scan
CREATE INDEX missing_index_4_3 ON [AdventureWorks].[Sales].[SalesOrderDetail] ([LineTotal]) INCLUDE ([SalesOrderID], [SalesOrderDetailID], [OrderQty])
564.7 1 5 0 99.66 0 2008-05-03 21:55:21.573 NULL
CREATE INDEX missing_index_2_1 ON [AdventureWorks].[Sales].[SalesOrderHeader]([PurchaseOrderNumber])
283.1 1 5 0 99.32 0 2008-05-03 21:55:21.557 NULL
Nevertheless, I don't like to put an index in my schema without knowing exactly which query it is supposed to improve. How would I know if it actually helped? How can I test it? Am I sure it won't make matters worse?
So, inspired by Greg Linwood procedure to find which queries are doing large index scans, I thought that I could use tweak that idea to answer those questions. The trick is that SQL Server query processor will include the missing index information in the showplan output as well, so I can parse the showplan XML for occurrences of missing indexes entries on the table I want. This way I an see the actual query and judge if that is indeed the correct index or if adjustments are necessary (the code is available to download here):
USE AdventureWorks
EXEC sp_DBA_QueriesMissingIndexes 'SalesOrderHeader',1
EXEC sp_DBA_QueriesMissingIndexes 'SalesOrderDetail',1
GO
Statement
Stored Procedure
query_plan
execution_count min_logical_reads max_logical_reads avg_logical_reads min_logical_writes max_logical_writes avg_logical_writes min_worker_time max_worker_time avg_worker_time min_elapsed_time max_elapsed_time avg_elapsed_time
)select CustomerID , SalesOrderNumber , SubTotal from [Sales] . [SalesOrderHeader] where ShipMethodID > @0 ......
NULL
<ShowPlanXML xmlns="
http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.0".........
1 703 703 703 0 0 0 26669 26669 26669 176983 176983 176983
(1 row(s) affected)
Statement
Stored Procedure
query_plan
execution_count min_logical_reads max_logical_reads avg_logical_reads min_logical_writes max_logical_writes avg_logical_writes min_worker_time max_worker_time avg_worker_time min_elapsed_time max_elapsed_time avg_elapsed_time
)select SalesOrderId , SalesOrderDetailId , OrderQty from Sales . SalesOrderDetail where LineTotal = @0 .......
NULL
<ShowPlanXML xmlns="
http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.0"........
1 1241 1241 1241 0 0 0 240383 240383 240383 573199 573199 573199
(1 row(s) affected)
The obvious limitation of that procedure is that it can only pinpoint queries that still have its plan stored in SQL Server cache. I found that, specially for low-cost queries, SQL tends to throw away the plan as soon as it finishes executing it. Setting "PARAMETERIZATION FORCED" for the database helps to keep query plans in the cache for more time. In "real life" scenarios it seems to find the "villains" pretty often.
I am not finished with it yet, as it is a nice excuse to learn XQuery in SQL Server and there are some quirks in the logic to extract the statement text (the open bracket at the beginning) that I want to fix. But I think it can be useful as it is, so I am making it public.
I also wouldn't mind getting some feedback on them. :-)