Covering Clustered Indexes

 

I’ve noticed that of late I’ve become a bit more critical of a well known publication that I suspect many DBA’s read. I have subscriptions to a number of publications and for the ones that I pay for I’m generally quite content.

I like paper / hard copy because I can make use of time where it’s just not practical to have a laptop or PC – and I spend my working day in front of a screen so sometimes it’s nice to escape to the garden or the pub or on the train, say travelling to a SSUG meeting without a PC.


So what’s set GrumpyOldDba off now you may ask? Well there’s an article been published about removing duplicate indexes – the Microsoft guys blogged this, with code, ages ago and I spent some time looking at this type of analysis as part of my index analysis series.

However it’s not quite so easy as it sounds and leaving aside duplicated single column non clustered indexes, which are obviously bad, the actual matter of where indexes have the same columns but in different orders or contained within is quite a complex matter and worth a whole lot of time on its own.

( Here’s the original blog from June 2007  http://blogs.msdn.com/mssqlisv/archive/2007/06/29/detecting-overlapping-indexes-in-sql-server-2005.aspx)


Idx_anytable_index1

 SalesID

PropertyID

SaleStartDate

SaleEndDate

UserID

IsCurrent

Idx_anytable_index2

 SalesID

UserID

PropertyID

SaleStartDate

SaleEndDate

IsCurrent

Idx_anytable_index3

 SalesID

PropertyID

SaleEndDate

IsCurrent

 

 

Idx_anytable_index4

 PropertyID

SalesID

SaleStartDate

SaleEndDate

UserID

IsCurrent


Taking the above indexes to be all standard non clustered indexes we can say that there’s a fair bit of duplication and in fact index 3 is already contained within index 1

Sadly with anything like this it just depends, and generally I’d say that index 3 is probably unwanted – however it may cover a query exactly, in which case index 1 might be ignored, and in certain joins one index may actually be preferred over another.

So what I’m saying is that other than basic duplicates it’s actually quite tricky, obviously you can check the dmv’s ( sql 2005 onwards ) to actually see if the index is being used, if you’re using 2000 or 7.0 then you can’t.


So what really got me worked up was the statement that creating a secondary index over a clustered index was a bad idea, now to me that such an article appears in this magazine is very disappointing and I’m also disappointed with the author – to me it shows a lack of editorial knowledge and like the review of the backup utilities last month isn’t what I’d expect from such a respected magazine. I’m also frankly becoming less and less impressed by the fact that many articles are only partly published and you have to go to the website for the full article. Maybe it’s a sign of the times that we don’t actually want paper copy any more.


Anyway here’s a very simple example of secondary indexes over the top of clustered indexes. The table I used has 2 million rows and is just under 16gb in size ( Let’s do real world examples here <grin> ) It’s a variation of the table in used in  “Analysing Index Part 4 – Size does matter”


http://www.grumpyolddba.co.uk/sql2005/working%20with%20indexes%204.mht


This is the test table and the code used to populate it. You might want to find other ways of populating the table if you’re not running on a server as I figure this will run for a few hours on a laptop! There were distinct reasons why I wanted to generate data this way for some previous tests, the only important factor for this tests is that each row should exceed half a page so as to force one row per page.


create table dbo.TestTable

(

NumKey int identity(1,1)not null,

cGuid uniqueidentifier default newid()null,

cSguid uniqueidentifier default NEWSEQUENTIALID() not null,

cBig bigint null,

cInt int null,

cDate1 datetime not null,

cDate2 datetime null,

cVchar1 varchar(50) null,

cChar1 char(20) null,

cVchar2 varchar(500) null,

cChar2 char(100),

Pad1 char(4000) not null

);

go

--

--

set nocount on

--

declare @num bigint,@count int,@count2 int,@count3 int

set @count = 1

set @count2 = 12

set @count3 = 1

while @count<2000001

begin

IF (@count%2)=0

            begin

                        insert into dbo.TestTable(cBig,cInt,cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2,pad1)

                        values( 36854775807+(36854775807/@count),@count2*3,dateadd(dd,@count3,'25 MAY 2007'),dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))),'0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20),datename(dw,getdate()+@count)+datename(month,getdate()+@count)) +'blahblahblahblahblahblahblahblahblahblahblahblah',

                        replicate(convert(char(36),newid()),110))

            end

else

            begin

                        insert into dbo.TestTable(cBig,cInt, cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2,pad1)

                        values( 854775807-(36854775807/@count),@count2*2,dateadd(dd,@count3,'5 dec 2006'),dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))),'0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20),datename(dw,getdate())+datename(month,getdate())) +'blahblahblahblahblahblahblahblahblahblahblahblah',

                        replicate(convert(char(36),newid()),110))

            end

set @count = @count+1;

set @count2 = @count2+2;

IF @count2>50

            set @count2 = 12;

IF (@count%3000)=0

            set @count3 = @count3+1;

 

end  

 


Ø  Here’s the output from sp_spaceused



Ø  and from sys.dm_db_index_physical_stats



Ø  Here’s my table after adding a clustered primary key on the integer identity column, NumKey



Ø  Extracting index information from sys.dm_db_index_physical_stats


 select index_id,index_type_desc,index_depth,index_level,page_count,record_count,min_record_size_in_bytes, max_record_size_in_bytes,avg_record_size_in_bytes

            from sys.dm_db_index_physical_stats (db_id(), object_id('testtable'), null ,null ,'detailed' )

 


And the results


index_id

index_type_desc

index_depth

index_level

page_count

record_count

min_record_size_in_bytes

max_record_size_in_bytes

avg_record_size_in_bytes

1

CLUSTERED

4

0

2000000

2000000

4577

4577

4577

1

CLUSTERED

4

1

3218

2000000

11

11

11

1

CLUSTERED

4

2

8

3218

11

11

11

1

CLUSTERED

4

3

1

8

11

11

11


Select 50 rows from the table by PK


select * from dbo.TestTable where NumKey between 50 and 100


(51 row(s) affected)

Table 'TestTable'. Scan count 1, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


This is fair enough because at 4,577 bytes per row it requires 51 page requests to return the data rows


 

 

The entire row

 


Just the Numkey column

 


The in statement

 


Ø  If we just select the integer PK column it still requires the same io to return the result set


select numkey from dbo.TestTable where NumKey between 50 and 100

 

(51 row(s) affected)

Table 'TestTable'. Scan count 1, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Ø  Just out of pure interest and as part of another test I was running I dropped the 51 values into an IN statement


select numkey from dbo.testtable where numkey

in (50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,

71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,

92,93,94,95,96,97,98,99,100)


(51 row(s) affected)


Table 'TestTable'. Scan count 51, logical reads 204, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Ø  The same for two values?


select numkey from dbo.testtable where numkey in (50,51)

 

(2 row(s) affected)

Table 'TestTable'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


So that’s making use of the clustered PK on the integer column, returning just the integer value.


Now create a non clustered secondary index over the PK


create index idx_TestTable_Numkey on dbo.TestTable(Numkey);

go


index_id

index_type_desc

index_depth

index_level

page_count

record_count

min_record_size_in_bytes

max_record_size_in_bytes

avg_record_size_in_bytes

1

CLUSTERED INDEX

4

0

2000000

2000000

4577

4577

4577

1

CLUSTERED INDEX

4

1

3218

2000000

11

11

11

1

CLUSTERED INDEX

4

2

8

3218

11

11

11

1

CLUSTERED INDEX

4

3

1

8

11

11

11

2

NONCLUSTERED INDEX

3

0

1732

2000000

5

5

5

2

NONCLUSTERED INDEX

3

1

4

1732

11

11

11

2

NONCLUSTERED INDEX

3

2

1

4

11

11

11


Now the original query


select numkey from dbo.TestTable where NumKey between 50 and 100


(51 row(s) affected)

Table 'TestTable'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Now in the overall scheme of things 56 to 3 io may not seem very much, but if this query runs maybe several times a second then there could be some big savings.

 

select numkey from dbo.testtable where numkey

in (50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,

71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,

92,93,94,95,96,97,98,99,100)

 

(51 row(s) affected)Table 'TestTable'. Scan count 51, logical reads 153, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


And for the IN Down 25%.


 

 

 

 

Conclusion:



To assume that the Primary Key will only be used for singleton selects is “strange” –  and to assume that the PK will always be the clustered index is also unhelpful. 

In real world applications it is not uncommon to find wide Primary Keys and wide clustered indexes, all on wide tables.

Which ever way you cut it the clustered index is always the width of the entire row and io will reflect that.

Where the Clustered Primary Key is several columns wide it’s more than likely you’ll find range selects against it and here a covering non clustered index can save you considerable io.


Not all databases are the same and what holds true for one may certainly not hold true for another, but sweeping generalisations are a problem, especially when they’re in a respected publication.


It’s also difficult for the DBA who has to try to explain to a Manager why what’s in such an article is wrong, I remember unsuccessfully spending over a year trying to explain what the lock waits counters actually represented to senior management because someone had read they actually meant something else and they had convinced themselves this was true – it wasn’t !


© www.grumpyolddba.co.uk

Published Monday, May 26, 2008 3:07 PM by GrumpyOldDBA

Comments

# re: Covering Clustered Indexes

Tuesday, May 27, 2008 8:45 PM by sql_noob

i've never done real scientific testing, but i never really likes composite indexes. used them sometimes, but i prefer single column indexes.

i know there may be an IO penalty and the correct thing to do is to write efficient code, but hardware is so cheap and powerful these days that in a lot of cases I think it's OK to break this rule.

I don't like composites because a lot of times a query will change or a new query will be written and almost always it will not take the composite index and revert to a scan and there is no way to write it to take advantage of the composite index.

# Indexes, Basically - from SQL Bits

Monday, September 15, 2008 4:19 PM by Grumpy Old DBA

I&#39;ll put the entire content including as much as I can remember of what I said, the demo code and

# Indexes - Clusteredand Non-clustered | keyongtech

Sunday, January 18, 2009 5:32 PM by Indexes - Clusteredand Non-clustered | keyongtech

Pingback from  Indexes - Clusteredand Non-clustered | keyongtech

# Clustered and non clustered index on one field | keyongtech

Wednesday, February 11, 2009 9:07 AM by Clustered and non clustered index on one field | keyongtech

Pingback from  Clustered and non clustered index on one field | keyongtech

# Analysing Indexes - count *

Wednesday, June 2, 2010 4:04 PM by Grumpy Old DBA

In my presentations on indexing I have always said that you should explore the advantages of covering