Analysing Indexes Part 1

Actually what started as a small document has continued to grow.  I'm working on performance tuning a well known application and as I document as I work so others can follow what I'm doing I thought I'd add it to my blog. There's nothing particularly startling in here but if you want to drill into index performance on your database this may help. As I remarked on Tony or maybe Simon's blog it's fine looking at small tables but when row counts are in hundreds of millions and secondary indexes hit 10Gb ( each ) then performance and storage become more critical - and difficult to test - it's not a quick task to drop and create a handful of indexes on a 250 million row table.

  • SQL Server 2005 maintains system views which contain information concerning indexes.
  • As these are views the information is not carried forward from a server restart
  • The information returned is cumulative so great care must be taken in evaluating the results.
  • To obtain consistent data and results we must store the information from the system views in permanent tables for later analysis.
  • The presented queries assume no partitioning

 

1.0  Operational cost on indexes ( assuming no RFI )

 

Operation

Read

Write

Select

Always

No

Insert

No

Always – every index

Update

Always

Only if row qualifies

Delete

Always

Only if row qualifies

 

 

2.0  Index Information

 

Object

Description

Notes

dbo.sysindexes

This is the SQL 2000 system table taken forward as a view.

This table may not be supported going forward.

 

sys.dm_db_index_physical_stats

This is actually a function and replaces dbcc showcontig

 

sys.indexes

2005 system table contains property information only for each index.

 

sys.dm_db_index_usage_stats

System view which records the access usage of every index in the database

We can view the number of times this index has been used, the type of access and the last access date

sys.dm_db_index_operational_stats

System function which records the operational cost of access to the index

We can view information such as the number of pages, locks, latches and waits.

sys.objects

Contains information on database objects

 

 

sys.dm_db_missing_index_details

System view which stores information on indexes the optimiser considers are missing.

 

sys.dm_db_missing_index_group_stats

System view which stores usage and access details for the missing indexes similar to sys.dm_db_index_usage_stats

We can also view the improvement that the index is computed to make, this is essentially the same type of output as shown by the Tuning Advisor.

sys.dm_db_missing_index_groups

Presumably this view will be of more use in SQL2008

 

sys.dm_db_missing_index_columns

A system function which returns the columns for a missing index

Requires the index handle as a parameter

3.0  sys.dm_db_index_usage_stats information

http://msdn2.microsoft.com/en-us/library/ms188755.aspx

 

  • There are groups of data columns which interest us
    • click on the link ( above ) for full table details
  • User access counts
  • User access dates
  • System access counts ( e.g. update statistics, index rebuilds )
  • System access dates

 

4.0  List unused tables/indexes

  • This will largely indicate empty tables as maintenance tasks will create entries within sys.dm_db_index_usage_stats where there is data in the table/index
    • Update statistics tablename
    • Dbcc dbreindex(tablename)
  • I’d normally full qualify queries with three part naming; this is omitted for this document.

 

--

-- run in database to analyse

--

Select object_name(i.object_id) as TableName,isnull(i.name,'HEAP') as IndexName, i.index_id

from sys.indexes i join sys.objects o on o.object_id = i.object_id

where objectproperty(o.object_id,'IsUserTable') = 1

and i.index_id NOT IN

(select s.index_id from sys.dm_db_index_usage_stats s

where s.object_id=i.object_id and i.index_id=s.index_id

and database_id = db_id() )

order by object_name(i.object_id),i.index_id;

 

 

 

5.0  Query to filter by user access

  • This sorts by lack of user access
  • This is a count of the times there was a seek, scan or lookup on the index/table
  • I’ve discounted updates from my criteria as an update will occur regardless of the index being used for selects
  • Essentially any index with zero as a value for seek, scan and lookup will be unused.

 

 

--

-- run in database to be analysed

--

select object_name(s.object_id) as TableName,isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end,

user_seeks as Seeks, user_scans as Scans, user_lookups as Lookups, user_updates as Updates

from sys.dm_db_index_usage_stats s join sys.indexes i on i.object_id = s.object_id

and i.index_id = s.index_id

where database_id = db_id() and objectproperty(s.object_id,'IsUserTable') = 1

order by (user_seeks + user_scans + user_lookups ) asc;

 

  • Remember that these values are cumulative since start up so we must snapshot the data on a regular basis to enable proper analysis
  • We must also collect data over a suitable time period based upon the usages of your database.
  • In an environment which runs, say, month end processes, it is vital that we capture the full work life cycle.
  • This is especially important if the CEO or MD has a favourite report they run once a month and you remove the index(es) which make this run quickly.

 

6.0  Capture snapshot of index activity

 

--

-- Run this is the database to be analysed

--

Insert into dbo.Unused_indexes

SELECT getdate(),* FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID()

and (last_user_seek < convert(char(11),getdate()-1) or last_user_seek is null)

and (last_user_scan < convert(char(11),getdate()-1) or last_user_scan is null);

 

 

  • I would schedule this to run every night just after midnight
  • I’m collecting information where an index has not been used in the preceding 24 hours for a user seek or user scan. ( see  Working with snapshots  for explanation of this )
  • The collection date is added for clarity.
  • A primary key could be defined on the columns TheDate,database_id,object_id,index_id
  • The table is a Heap

 

6.1  Table to capture index activity

 

CREATE TABLE dbo.Unused_Indexes

(

            TheDate datetime NOT NULL,

            database_id smallint NOT NULL,

            [object_id] int NOT NULL,

            index_id int NOT NULL,

            user_seeks bigint NOT NULL,

            user_scans bigint NOT NULL,

            user_lookups bigint NOT NULL,

            user_updates bigint NOT NULL,

            last_user_seek datetime NULL,

            last_user_scan datetime NULL,

            last_user_lookup datetime NULL,

            last_user_update datetime NULL,

            system_seeks bigint NOT NULL,

            system_scans bigint NOT NULL,

            system_lookups bigint NOT NULL,

            system_updates bigint NOT NULL,

            last_system_seek datetime NULL,

            last_system_scan datetime NULL,

            last_system_lookup datetime NULL,

            last_system_update datetime NULL

)

 

 

 

 

7.0  sys.dm_db_index_operational_stats

http://msdn2.microsoft.com/en-us/library/ms174281.aspx

 

  • There are groups of data columns which interest us
    • click on the link ( above ) for full table details
  • This is a very important table which contains a wealth of information
  • We can view waits, page splits, and volume of access at leaf and non-leaf levels

 

8.0  View index usage by work - Selects

  • This query shows all indexes, reads to top , writes to bottom

 

--

-- Run this is the database to be analysed

--

select object_name(s.object_id) as TableName, isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType

, reads=range_scan_count + singleton_lookup_count

, 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count

, 'leaf_page_splits' = leaf_allocation_count

, 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count

, 'nonleaf_page_splits' = nonleaf_allocation_count

from sys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) s join sys.indexes i

on i.object_id = s.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

order by reads desc, leaf_writes, nonleaf_writes;

 

 

8.1  This query only shows those indexes which have reads

 

--

-- Run this is the database to be analysed

--

select object_name(s.object_id) as TableName, isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType

, reads=range_scan_count + singleton_lookup_count

, 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count

, 'leaf_page_splits' = leaf_allocation_count

, 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count

, 'nonleaf_page_splits' = nonleaf_allocation_count

from sys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) s join sys.indexes i

on i.object_id = s.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

and (range_scan_count + singleton_lookup_count)>0

order by reads desc, leaf_writes, nonleaf_writes;

 

 

 

 

8.2  Indexes without reads but with high writes may be being maintained unnecessarily

 

--

-- Run this is the database to be analysed

--

select object_name(s.object_id) as TableName, isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType

, reads=range_scan_count + singleton_lookup_count

, 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count

, 'leaf_page_splits' = leaf_allocation_count

, 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count

, 'nonleaf_page_splits' = nonleaf_allocation_count

from sys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) s join sys.indexes i

on i.object_id = s.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

and (range_scan_count + singleton_lookup_count)=0

and (leaf_update_count+ leaf_delete_count+ nonleaf_update_count + nonleaf_delete_count)>0

order by leaf_writes desc, nonleaf_writes desc;

 

9.0  Order by writes to find your most heavily updated index

  • The inserts have been separated as we cannot avoid this io
  • Indexes without writes are excluded
  • Secondary indexes appearing high on the list may benefit from being placed on a separate filegroup
  • High levels of leaf writes on a HEAP may indicate the possibility of fragmentation which cannot be removed

 

--

-- Run this is the database to be analysed

--

select object_name(s.object_id) as TableName, isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType

,'total_writes'=leaf_update_count+ leaf_delete_count+nonleaf_update_count + nonleaf_delete_count

,'total_insert_writes'=leaf_insert_count+nonleaf_insert_count

, 'leaf_writes'=leaf_update_count+ leaf_delete_count

, 'nonleaf_writes'=nonleaf_update_count + nonleaf_delete_count

, 'insert_leaf_writes'=leaf_insert_count

, 'insert_nonleaf_writes'=nonleaf_insert_count

, 'leaf_page_splits' = leaf_allocation_count

, 'nonleaf_page_splits' = nonleaf_allocation_count

, reads=range_scan_count + singleton_lookup_count

from sys.dm_db_index_operational_stats (db_id(),NULL,NULL,NULL) s join sys.indexes i

on i.object_id = s.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

and (leaf_update_count+ leaf_delete_count+nonleaf_update_count + nonleaf_delete_count+leaf_insert_count+nonleaf_insert_count)>0

order by total_writes desc,total_insert_writes desc;

 

10.0  Simple query for user access reads and writes

 

--

-- Run this is the database to be analysed

--

select object_name(s.object_id) as TableName, isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType

,reads=user_seeks + user_scans + user_lookups

,writes =  user_updates

from sys.dm_db_index_usage_stats s join sys.indexes i

on s.object_id = i.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

and s.database_id = db_id()

order by reads desc;

go

 

 

 

In much the same way as ::fn_virtualfilestats  or sys_dm_io_virtual_file_stats  return information concerning file activity   sys.dm_db_index_operational_stats  allows us to view similar information for our indexes

http://msdn2.microsoft.com/en-us/library/ms190326.aspx

http://msdn2.microsoft.com/en-us/library/ms187309.aspx

 

11.0  Query to display waits and blocks

  • this query is cumulative so is of questionable value

 

--

-- Run this is the database to be analysed

--

Select object_name(s.object_id) as TableName

,isnull(i.name,'HEAP') as IndexName

,case i.index_id  

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType  

,row_lock_count, row_lock_wait_count

,[block %]=cast (100.0 * row_lock_wait_count / (1 + row_lock_count) as numeric(15,2))

, row_lock_wait_in_ms

, [avg row lock waits in ms]=cast (1.0 * row_lock_wait_in_ms / (1 + row_lock_wait_count) as numeric(15,2))

from sys.dm_db_index_operational_stats (db_id(), NULL, NULL, NULL) s join sys.indexes i

on i.object_id = s.object_id and i.index_id = s.index_id

where objectproperty(s.object_id,'IsUserTable') = 1

order by row_lock_wait_count desc

 

 

 

 

 

 

 

11.1  Typical sample output

 

Table

Index

Index Type

Row lock count

Row lock wait count

block %

Row lock wait in ms

avg row lock waits in ms

Sales

$21

NC

32679

34

0.10

104076

2973.60

Employees

PK_Employees

CLUS

958510

32

0.00

1055219

31976.33

Invoices

PK_Invoices

CLUS

100332

27

0.03

19049

680.32

Returns

Clx_Returns_Date

CLUS

1771122

21

0.00

168156

7643.45

Areas

PK_Areas

CLUS

50762

12

0.02

937

72.08

 

  • In general terms as this is from start up the values are likely diluted, however we can see that on the Employees table ( none of these tables are real, although the data is ) we have some serious waits when a wait does occur.
  • You would want to snapshot and compare to extract operational data values.

 

12.0  Find your worst scans

  • An index scan may not be an issue so we’ll concentrate on table scans
  • A clustered index scan is a table scan so we’ll select on index id 0 and 1

 

--

-- run in database to be analysed

--

select object_name(s.object_id) as TableName,isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType,

user_seeks as Seeks, user_scans as Scans, user_lookups as Lookups

from sys.dm_db_index_usage_stats s join sys.indexes i on i.object_id = s.object_id

and i.index_id = s.index_id

where database_id = db_id() and objectproperty(s.object_id,'IsUserTable') = 1

and user_scans>0 and i.index_id<2

order by user_scans desc;  

 

 

12.1  Add a calculation and sort by proportion of scans to seeks

 

--

-- run in database to be analysed

--

select object_name(s.object_id) as TableName,isnull(i.name,'HEAP') as IndexName,

case i.index_id

when 0 then 'HEAP'

when 1 then 'CLUS'

else 'NC'

end as IndexType,

user_seeks as Seeks, user_scans as Scans, user_lookups as Lookups

,cast((user_scans*1.0/(user_seeks+user_scans))*100 as numeric(5,2)) as '%age'

from sys.dm_db_index_usage_stats s join sys.indexes i on i.object_id = s.object_id

and i.index_id = s.index_id

where database_id = db_id() and objectproperty(s.object_id,'IsUserTable') = 1

and user_scans>0 and i.index_id<2

order by '%age' desc;

 

13.0  Working with snapshots

 

In section 6.0 ( above ) we devised a simple query to collect index data

This method allows us to analyse usage far better.

Each index will have a row in the table if it were not used in the previous 24 hours

The snapshot taken just after midnight

After collecting data for, say, 32 days if an index has 32 entries in the table then it was not used in this time at all

 

13.1  This query shows counts for index entries

  • It assumes that the table Unused_Indexes was created in the user database

 

--

-- run in database to be analysed

--

select object_name(ui.[object_id]),si.name,ui.index_id,count(*) as Days

from dbo.Unused_Indexes ui join sys.indexes si

on ui.[object_id] = si.[object_id] and ui.index_id = si.index_id

where objectproperty(ui.[object_id],'IsUserTable') = 1

group by ui.[object_id],ui.index_id,si.name

order by count(*) desc,ui.[object_id],ui.index_id asc;

 

 

 

13.2      How large are indexes in my database

·          Great care must be exercised when running a query which includes sys.dm_db_index_physical_stats as for large tables this may well run for some time.

·          An alternative would be to gather the information as a separate process .. see 13.4

 

--

-- use with great care !!!

--

SELECT object_name(a.[object_id]) as TableName,a.index_id, isnull(b.name,'HEAP') as IndexName, sum(a.page_count) as pages,sum(a.page_count)*1.0/1024 as Mb

FROM sys.dm_db_index_physical_stats (DB_ID(), NULL,NULL, NULL, 'DETAILED') AS a

    JOIN sys.indexes AS b ON a.object_id = b.object_id AND a.index_id = b.index_id

group by a.[object_id],a.index_id, b.name

order by pages desc;

GO 

 

 

 

TableName

index_id

IndexName

pages

Mb

Employees

1

PK_Employees

86897

84.860351

Sales

1

PK_Sales

72222

70.529296

TableInfo

0

HEAP

13909

13.583007

Areas

1

PK_Areas

5753

5.618164

 

 

13.3  How large are my unused indexes

  • Great care must be exercised when running a query which includes sys.dm_db_index_physical_stats as for large tables this may well run for some time.
  • An alternative would be to gather the information as a separate process .. see 13.4
  • This expands upon the query in 13.1
  • One of the joys of SQL 2005 is the ability to join using a table valued function, sadly some of microsoft’s functions don’t work, sys.dm_db_index_physical_stats for example, however this is not a problem as all we need to do is create our own table valued function then we can happily join sys.dm_db_index_physical_stats to any table of our choice.
  • IMPORTANT queries using this may be long running, especially on a large database – be very careful.

 

13.4  Table Valued Function for sys.dm_db_index_physical_stats

  • I personally would place this in the master database and might be tempted to make this part of the sys schema, for the examples that follow I’ve placed the function in master.

 

Create function dbo.fn_db_index_physical_stats

--

-- sys.dm_db_index_physical_stats is actually a function not a view

-- sadly you can't use CROSS APPLY with it, however, drop it in a table

-- valued function and you can!

--

(

@db_id int

,@object_id int

,@index_id int

,@partition_number int

,@mode nvarchar(16)

)

Returns @table TABLE

(

[database_id] [smallint] NULL,

[object_id] [int] NULL,

[index_id] [int] NULL,

[partition_number] [int] NULL,

[index_type_desc] [nvarchar](60) NULL,

[alloc_unit_type_desc] [nvarchar](60) NULL,

[index_depth] [tinyint] NULL,

[index_level] [tinyint] NULL,

[avg_fragmentation_in_percent] [float] NULL,

[fragment_count] [bigint] NULL,

[avg_fragment_size_in_pages] [float] NULL,

[page_count] [bigint] NULL,

[avg_page_space_used_in_percent] [float] NULL,

[record_count] [bigint] NULL,

[ghost_record_count] [bigint] NULL,

[version_ghost_record_count] [bigint] NULL,

[min_record_size_in_bytes] [int] NULL,

[max_record_size_in_bytes] [int] NULL,

[avg_record_size_in_bytes] [float] NULL,

[forwarded_record_count] [bigint] NULL

)

BEGIN

            insert into @table

            select *

            from sys.dm_db_index_physical_stats (@db_id, @object_id, @index_id ,@partition_number,@mode )

            return

END;

--end function

go

 

 

13.5  Sample Query to show size of 25 unused indexes

  • This query is selecting indexes which have 30 entries in the table Unused_Indexes

 

declare @table table(object_id int,ind_name sysname,index_id int)

insert into @table

select top 25 ui.[object_id],si.name,ui.index_id

from dbo.Unused_Indexes ui join sys.indexes si

on ui.[object_id] = si.[object_id] and ui.index_id = si.index_id

where objectproperty(ui.[object_id],'IsUserTable') = 1

group by ui.[object_id],ui.index_id,si.name

having count(*) >30

order by count(*) desc,ui.[object_id],ui.index_id asc

--

select  object_name(ui.[object_id]) as TableName,ui.ind_name as IndexName,      

max(fps.index_level)as IndexDepth,sum(fps.page_count) as TotalPages,max(fps.record_count) as LeafRows, sum(fps.page_count)/1024 as Mb

from @table ui

CROSS APPLY master.dbo.fn_db_index_physical_stats (DB_ID(),ui.[object_id],ui.index_id,NULL, 'Detailed') AS fps

group by ui.[object_id],ui.ind_name

order by sum(fps.page_count) desc ;

 

Abridged results

 

TableName

IndexName

IndexDepth

TotalPages

LeafRows

Mb

Employees

Idx_Employees_123

4

54811

11159545

53.526367

Invoices

Idx_Invoices_paper

4

44532

5162428

43.488281

Invoices

Idx_Invoices_ink

4

39539

5162428

38.612304

DepartmentSales

Idx_DS_Promotions

3

16791

3199181

16.397460

Sales2006

Idx_sales2007_opened

2

116

31291

0.113281

Sales2006

Idx_sales2007_area

2

103

31291

0.100585

Sales2006

Idx_sales2007_size

2

87

31291

0.084960

 

  • The value for index depth designates the depth of the b-tree, a depth of 4 means that there will be 4 io to return a value from that index.
  • Total pages is the sum of used pages reported fro all levels of the index, not just the leaf level
  • Leaf Rows should indicate the number of rows in the index, normally the number of rows in the table.
  • Mb is the calculation of the physical storage space the index is using

 

 

 

 

References

 

How can SQL Server 2005 help me evaluate and manage indexes?

            http://blogs.msdn.com/sqlcat/archive/2005/12/12/502735.aspx

 

 

 

 

Published 12 July 2007 16:17 by GrumpyOldDBA
Filed under: ,

Comments

# Analysing Indexes - A Must Read article from Colin Leversuch Roberts.

Colin Leversuch Roberts has posted up a really useful (and big) list of useful queries, advice and notes

# re: Analysing Indexes Part 1

16 July 2007 09:24 by GrumpyOldDBA

I actually see I made an error in the divisor in the space calcs, 13.2 & 13.5  it should be 128 not 1024 as the count is in pages ( 8kb ) not kb. I will edit this and revise the calulations.

# Missing Index DMVs

22 January 2008 16:54 by Troubleshooting and Tips - Cindy Gross

SQL Server 2005 has some DMVs that will help you tune the system. The missing index DMVs track recent queries that could have benefited from an index that didn't exist. ...

# Three steps to avoid blocking. | Michael J. Swart

Pingback from  Three steps to avoid blocking. | Michael J. Swart