Analysing Indexes Part 4 - Size does matter - 3

11. dropping the clustered index shows all secondary indexes at + 9 bytes

( integer index replaced with guid index )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

153847

2000000

579

datetime 2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

Guid  3

NONCLUSTERED INDEX

3

0

6689

2000000

25

3

NONCLUSTERED INDEX

3

1

28

6689

31

3

NONCLUSTERED INDEX

3

2

1

28

31

char(20) 4

NONCLUSTERED INDEX

3

0

7663

2000000

29

4

NONCLUSTERED INDEX

3

1

36

7663

35

4

NONCLUSTERED INDEX

3

2

1

36

35

 

 

12. Compound clustered key; Integer+guid

secondary keys take int + guid + 1byte =  + 21 bytes per entry

As expected from deduction in step 10

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

552

153847

27

1

CLUSTERED INDEX

4

2

2

552

27

1

CLUSTERED INDEX

4

3

1

2

27

2

NONCLUSTERED INDEX

3

0

7663

2000000

29

2

NONCLUSTERED INDEX

3

1

36

7663

35

2

NONCLUSTERED INDEX

3

2

1

36

35

3

NONCLUSTERED INDEX

3

0

9662

2000000

37

3

NONCLUSTERED INDEX

3

1

54

9662

43

3

NONCLUSTERED INDEX

3

2

1

54

43

4

NONCLUSTERED INDEX

3

0

10639

2000000

41

4

NONCLUSTERED INDEX

3

1

65

10639

47

4

NONCLUSTERED INDEX

3

2

1

65

47

 

Index 2

datetime

Index 3

guid

Index 4

char(20)

 

 

 

13. really wide clustered indexes

It’s not unusual to find wide clustered indexes, the application being tuned which started this series of posts has some tables with more than 7 columns making the key. The tables contain, generally over 100 or more columns.

This isn’t actually a really wide index, it’s only 60 bytes wide, and my table is quite narrow at 577 bytes, however it’s useful when comparing to step 15 where I’ve widened the table a bit.

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

1315

153847

67

1

CLUSTERED INDEX

4

2

12

1315

67

1

CLUSTERED INDEX

4

3

1

12

67

 

 

14. Now lets create a secondary index which covers the PK + one additional datetime column

 

 

alter table dbo.testtable2 add constraint PK_TestTable2 primary key clustered  (NumKey,cSguid,cBig,cInt,cDate2,cChar1);

go  --index 1

 

create index testtable2_idx_test1 on dbo.testtable2(NumKey,cSguid,cBig,cInt,cDate2,cChar1,cDate1); --index 2

 

create index testtable2_idx_test2 on dbo.testtable2(cDate1); -- index 3

 

create index testtable2_idx_test3 on dbo.testtable2(cDate1,cDate2); -- index 4

 

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

1315

153847

67

1

CLUSTERED INDEX

4

2

12

1315

67

1

CLUSTERED INDEX

4

3

1

12

67

2

NONCLUSTERED INDEX

4

0

17544

2000000

69

2

NONCLUSTERED INDEX

4

1

168

17544

75

2

NONCLUSTERED INDEX

4

2

2

168

75

2

NONCLUSTERED INDEX

4

3

1

2

75

3

NONCLUSTERED INDEX

4

0

17544

2000000

69

3

NONCLUSTERED INDEX

4

1

168

17544

75

3

NONCLUSTERED INDEX

4

2

2

168

75

3

NONCLUSTERED INDEX

4

3

1

2

75

4

NONCLUSTERED INDEX

4

0

17544

2000000

69

4

NONCLUSTERED INDEX

4

1

168

17544

75

4

NONCLUSTERED INDEX

4

2

2

168

75

4

NONCLUSTERED INDEX

4

3

1

2

75

 

This raises interesting aspects of index design when secondary indexes may include columns from the PK

All three secondary indexes are identical in size despite the different definitions.

 

15. here the width of the table has been increased, the number of pages for the clustered index is always the table size. Compare that to the size of the secondary index 2 which covers the clustered index.

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

5

0

1384615

2000000

3527

1

CLUSTERED INDEX

5

1

21038

1384615

67

1

CLUSTERED INDEX

5

2

337

21038

67

1

CLUSTERED INDEX

5

3

5

337

67

1

CLUSTERED INDEX

5

4

1

5

67

2

NONCLUSTERED INDEX

4

0

15625

2000000

61

2

NONCLUSTERED INDEX

4

1

134

15625

67

2

NONCLUSTERED INDEX

4

2

2

134

67

2

NONCLUSTERED INDEX

4

3

1

2

67

3

NONCLUSTERED INDEX

4

0

17544

2000000

69

3

NONCLUSTERED INDEX

4

1

168

17544

75

3

NONCLUSTERED INDEX

4

2

2

168

75

3

NONCLUSTERED INDEX

4

3

1

2

75

4

NONCLUSTERED INDEX

4

0

17544

2000000

69

4

NONCLUSTERED INDEX

4

1

168

17544

75

4

NONCLUSTERED INDEX

4

2

2

168

75

4

NONCLUSTERED INDEX

4

3

1

2

75

 

Indexes defined on table

 

Index ID

Index Type

Index Name

Index Columns

1

CLUSTERED

PK_TestTable2

NumKey,cSguid,cBig,cInt,cDate2,cChar1

2

NONCLUSTERED

testtable2_idx_test1

NumKey,cSguid,cBig,cInt,cDate2,cChar1

3

NONCLUSTERED

testtable2_idx_test2

cDate1

4

NONCLUSTERED

testtable2_idx_test3

cDate1,cDate2

 

 

Script to list index column names ( results as above )

 

--

-- list index columns for table

--

select ic.index_id,ic.type_desc,ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id)

from sys.indexes ic where object_id=object_id('testtable2');

 

 

Function used in query; The function should be placed within the database to be queried.

 

 

/****** Object:  UserDefinedFunction [dbo].[fn_IndexColumns]    Script Date: 09/26/2007 21:16:13 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

create function [dbo].[fn_IndexColumns] (@object_id int,@index_id int)

-- ============================================================= 

--  Function:                 dbo.fn_IndexColumns                         

--  Written by:            (c)Colin Leversuch-Roberts

--                                www.kelemconsulting.co.uk                               

--                                                                

--  Purpose:                  produce csv list of columns making up an index

--                                                               

--  System:                   Database to be queried ( sql 2005 only )

--

--  Input Paramters:     @object_id int

--                                                                      @index_id int

--                                 

--  Returns :                 csv within varchar(2000)                                   

--                                                               

--  Usage:                   produces a list of index columns for a given object id and index id

--                                              select dbo.fn_IndexColumns(645577338,2)

--                                              select ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id) from sys.indexes ic

--                                              ( don't do this on a large database )

--

-- Calls             sys.index_columns, sys.columns                                             

--                                                                

--  Notes:                     inline functions are bad and I suppose I could have made this a tvf

--                                              however it can be used standalone or embedded and it's not like it is

--                                              a function to used often.

--                                                                

--  VERSION HISTORY

--  Version No               Date                             Description

--  1                              23 August 2007           Initial Release                             

-- ============================================================= 

returns varchar(2000)

as

BEGIN

declare @columns varchar(2000)

select @columns = coalesce(@columns+',','') + sc.name from

sys.index_columns ic join sys.columns sc on sc.column_id=ic.column_id and sc.[object_id]=ic.[object_id]

where ic.[object_id] = @object_id and ic.index_id = @index_id

order by ic.key_ordinal

return(@columns)

END;

 

 

I’m not really out to show index tuning here, more to just show the importance of size when it comes to io performance, a crucial factor in tuning an oltp database.

 

 

--

-- default query uses secondary index ( index 2 )

--

select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2

where numkey between 100 and 2000;

 

 

(1901 row(s) affected)

Table 'Testtable2'. Scan count 1, logical reads 20, physical reads 0

 

 

--

-- now force PK

--

select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2 with(index=1)

where numkey between 100 and 2000;

 

 

(1901 row(s) affected)

Table 'Testtable2'. Scan count 1, logical reads 1343, physical reads 0

 

Note: As the clustered index is always the entire table the number of io to return the same data is always going to be much higher on a wide table; If you’ve specifically chosen the clsutered key to cover a query you may actually find a sceondary index to be much more efficient. I knew that for the query I defined the clustered index would not be used, which is why I had to use a hint to force the query.

 

The extra 1,323 page reads technically represent an extra 10.3Mb of data placed into the data cache. Whilst this might seem small in the overall scheme of things should this query be called several times a second then the difference becomes significant.

 

16. From the figures above, step 15, we can quickly calculate that for my target table the secondary indexes would add up to some 80Gb

In the main we also know that quite often the secondary index which covers the clustered index would actually be far more efficient.

So if we make the clustered primary key, non clustered e.g. the secondary index, index 2, and cluster on Numkey only we will have a similar physical sort order for the data ; obviously not fully but this is an example.

 

 

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

1000000

2000000

3527

1

CLUSTERED INDEX

4

1

1608

1000000

11

1

CLUSTERED INDEX

4

2

3

1608

11

1

CLUSTERED INDEX

4

3

1

3

11

2

NONCLUSTERED INDEX

4

0

15625

2000000

61

2

NONCLUSTERED INDEX

4

1

134

15625

67

2

NONCLUSTERED INDEX

4

2

2

134

67

2

NONCLUSTERED INDEX

4

3

1

2

67

3

NONCLUSTERED INDEX

3

0

3711

2000000

13

3

NONCLUSTERED INDEX

3

1

10

3711

19

3

NONCLUSTERED INDEX

3

2

1

10

19

4

NONCLUSTERED INDEX

3

0

5682

2000000

21

4

NONCLUSTERED INDEX

3

1

21

5682

27

4

NONCLUSTERED INDEX

3

2

1

21

27

 

The covered index ( non clustered Primary Key index 2 ) remains the same size as before but the two secondary indexes, indexes 3 and 4 have reduced in size. For my target table the saving in secondary index space is some 42Gb, just over 50%. The smaller page count makes the indexes more efficient.

The smaller clustered key also appears to have reduced the storage size of the table, however it is possible that in generating the wider table I introduced some fragmentation.

 

Conclusions

 

Indexes on nullable columns have a 3 byte overhead per row/index row

Indexes on a HEAP gain a 9 byte rid per row/index row.

As expected constraints make no difference.

In a choice between an integer and a guid as a clustered key for a table unless there’s really a real reason to use a guid – don’t.

The width of the data objects making the clustered index or key are added to every row of a secondary index , + 1 byte; In other words the clustered index is physically appended to every secondary index.

Make clustered indexes narrow ; wide Primary Keys should be non clustered.

 

NOTE:  One of my standard DBA interview questions is to ask if you can have a non clustered Primary Key, the answer is of course YES. I’d say at least 60% of the DBA’s I’ve asked this question got it wrong.

 

 

Appendix: Scripts

 

Create table Testable

 

create table dbo.Testtable

(

NumKey int identity(1,1),

cGuid uniqueidentifier default newid()null,

cSguid uniqueidentifier default NEWSEQUENTIALID() not null,

cBig bigint null,

cInt int null,

cDate1 datetime not null,

cDate2 datetime null,

cVchar1 varchar(50) null,

cChar1 char(20) not null,

cVchar2 varchar(500) null,

cChar2 char(100) null

);

go

 

 

Create Testtable2

 

create table dbo.Testtable2

(

NumKey int identity(1,1),

cGuid uniqueidentifier default newid()not null,

cSguid uniqueidentifier default NEWSEQUENTIALID() not null,

cBig bigint not null,

cInt int not null,

cDate1 datetime not null,

cDate2 datetime not null,

cVchar1 varchar(50) not null,

cChar1 char(20) not null,

cVchar2 varchar(500) not null,

cChar2 char(100) not null

);

go

 

 

Script used to populate tables

 

--

-- populate table

--

set nocount on

--

declare @num bigint,@count int,@count2 int,@count3 int

set @count = 1

set @count2 = 12

set @count3 = 1

while @count<2000001

begin

IF (@count%2)=0

            begin

                        insert into dbo.Testtable2(cBig,cInt,cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)

                        values( 36854775807+(36854775807/@count),@count2*3,dateadd(dd,@count3,'25 MAY 2007'), dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate()+@count)+datename(month,getdate()+@count))+'blahblahblahblahblahblahblahblahblahblahblahblah')

            end

else

            begin

                        insert into dbo.Testtable2(cBig,cInt, cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)

                        values( 854775807-(36854775807/@count),@count2*2,dateadd(dd,@count3,'5 dec 2006'), dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate())+datename(month,getdate()))+'blahblahblahblahblahblahblahblahblahblahblahblah')

            end

set @count = @count+1;

set @count2 = @count2+2;

IF @count2>50

            set @count2 = 12;

IF (@count%3000)=0

            set @count3 = @count3+1;

end;

 

 

Query to view index data ( change table name to suit )

 

 

select index_id,index_type_desc,index_depth,index_level,page_count,record_count, min_record_size_in_bytes,max_record_size_in_bytes,avg_record_size_in_bytes

            from sys.dm_db_index_physical_stats (db_id(), object_id('testtable2'), null , null ,'detailed' );

 

 

 

Published Thursday, September 27, 2007 8:19 AM by GrumpyOldDBA
Filed under: ,

Comments

# Analysing Indexes - count *

Wednesday, June 2, 2010 4:04 PM by Grumpy Old DBA

In my presentations on indexing I have always said that you should explore the advantages of covering