11. dropping the clustered index shows all secondary indexes at + 9 bytes
( integer index replaced with guid index )
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
0 |
HEAP |
1 |
0 |
153847 |
2000000 |
579 |
|
datetime 2 |
NONCLUSTERED INDEX |
3 |
0 |
4695 |
2000000 |
17 |
|
2 |
NONCLUSTERED INDEX |
3 |
1 |
15 |
4695 |
23 |
|
2 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
15 |
23 |
|
Guid 3 |
NONCLUSTERED INDEX |
3 |
0 |
6689 |
2000000 |
25 |
|
3 |
NONCLUSTERED INDEX |
3 |
1 |
28 |
6689 |
31 |
|
3 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
28 |
31 |
|
char(20) 4 |
NONCLUSTERED INDEX |
3 |
0 |
7663 |
2000000 |
29 |
|
4 |
NONCLUSTERED INDEX |
3 |
1 |
36 |
7663 |
35 |
|
4 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
36 |
35 |
12. Compound clustered key; Integer+guid
secondary keys take int + guid + 1byte = + 21 bytes per entry
As expected from deduction in step 10
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
1 |
CLUSTERED INDEX |
4 |
0 |
153847 |
2000000 |
577 |
|
1 |
CLUSTERED INDEX |
4 |
1 |
552 |
153847 |
27 |
|
1 |
CLUSTERED INDEX |
4 |
2 |
2 |
552 |
27 |
|
1 |
CLUSTERED INDEX |
4 |
3 |
1 |
2 |
27 |
|
2 |
NONCLUSTERED INDEX |
3 |
0 |
7663 |
2000000 |
29 |
|
2 |
NONCLUSTERED INDEX |
3 |
1 |
36 |
7663 |
35 |
|
2 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
36 |
35 |
|
3 |
NONCLUSTERED INDEX |
3 |
0 |
9662 |
2000000 |
37 |
|
3 |
NONCLUSTERED INDEX |
3 |
1 |
54 |
9662 |
43 |
|
3 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
54 |
43 |
|
4 |
NONCLUSTERED INDEX |
3 |
0 |
10639 |
2000000 |
41 |
|
4 |
NONCLUSTERED INDEX |
3 |
1 |
65 |
10639 |
47 |
|
4 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
65 |
47 |
|
Index 2 |
datetime |
|
Index 3 |
guid |
|
Index 4 |
char(20) |
13. really wide clustered indexes
It’s not unusual to find wide clustered indexes, the application being tuned which started this series of posts has some tables with more than 7 columns making the key. The tables contain, generally over 100 or more columns.
This isn’t actually a really wide index, it’s only 60 bytes wide, and my table is quite narrow at 577 bytes, however it’s useful when comparing to step 15 where I’ve widened the table a bit.
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
1 |
CLUSTERED INDEX |
4 |
0 |
153847 |
2000000 |
577 |
|
1 |
CLUSTERED INDEX |
4 |
1 |
1315 |
153847 |
67 |
|
1 |
CLUSTERED INDEX |
4 |
2 |
12 |
1315 |
67 |
|
1 |
CLUSTERED INDEX |
4 |
3 |
1 |
12 |
67 |
14. Now lets create a secondary index which covers the PK + one additional datetime column
|
alter table dbo.testtable2 add constraint PK_TestTable2 primary key clustered (NumKey,cSguid,cBig,cInt,cDate2,cChar1);
go --index 1
create index testtable2_idx_test1 on dbo.testtable2(NumKey,cSguid,cBig,cInt,cDate2,cChar1,cDate1); --index 2
create index testtable2_idx_test2 on dbo.testtable2(cDate1); -- index 3
create index testtable2_idx_test3 on dbo.testtable2(cDate1,cDate2); -- index 4
|
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
1 |
CLUSTERED INDEX |
4 |
0 |
153847 |
2000000 |
577 |
|
1 |
CLUSTERED INDEX |
4 |
1 |
1315 |
153847 |
67 |
|
1 |
CLUSTERED INDEX |
4 |
2 |
12 |
1315 |
67 |
|
1 |
CLUSTERED INDEX |
4 |
3 |
1 |
12 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
0 |
17544 |
2000000 |
69 |
|
2 |
NONCLUSTERED INDEX |
4 |
1 |
168 |
17544 |
75 |
|
2 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
168 |
75 |
|
2 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
75 |
|
3 |
NONCLUSTERED INDEX |
4 |
0 |
17544 |
2000000 |
69 |
|
3 |
NONCLUSTERED INDEX |
4 |
1 |
168 |
17544 |
75 |
|
3 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
168 |
75 |
|
3 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
0 |
17544 |
2000000 |
69 |
|
4 |
NONCLUSTERED INDEX |
4 |
1 |
168 |
17544 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
168 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
75 |
This raises interesting aspects of index design when secondary indexes may include columns from the PK
All three secondary indexes are identical in size despite the different definitions.
15. here the width of the table has been increased, the number of pages for the clustered index is always the table size. Compare that to the size of the secondary index 2 which covers the clustered index.
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
1 |
CLUSTERED INDEX |
5 |
0 |
1384615 |
2000000 |
3527 |
|
1 |
CLUSTERED INDEX |
5 |
1 |
21038 |
1384615 |
67 |
|
1 |
CLUSTERED INDEX |
5 |
2 |
337 |
21038 |
67 |
|
1 |
CLUSTERED INDEX |
5 |
3 |
5 |
337 |
67 |
|
1 |
CLUSTERED INDEX |
5 |
4 |
1 |
5 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
0 |
15625 |
2000000 |
61 |
|
2 |
NONCLUSTERED INDEX |
4 |
1 |
134 |
15625 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
134 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
67 |
|
3 |
NONCLUSTERED INDEX |
4 |
0 |
17544 |
2000000 |
69 |
|
3 |
NONCLUSTERED INDEX |
4 |
1 |
168 |
17544 |
75 |
|
3 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
168 |
75 |
|
3 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
0 |
17544 |
2000000 |
69 |
|
4 |
NONCLUSTERED INDEX |
4 |
1 |
168 |
17544 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
168 |
75 |
|
4 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
75 |
Indexes defined on table
|
Index ID |
Index Type |
Index Name |
Index Columns |
|
1 |
CLUSTERED |
PK_TestTable2 |
NumKey,cSguid,cBig,cInt,cDate2,cChar1 |
|
2 |
NONCLUSTERED |
testtable2_idx_test1 |
NumKey,cSguid,cBig,cInt,cDate2,cChar1 |
|
3 |
NONCLUSTERED |
testtable2_idx_test2 |
cDate1 |
|
4 |
NONCLUSTERED |
testtable2_idx_test3 |
cDate1,cDate2 |
Script to list index column names ( results as above )
|
--
-- list index columns for table
--
select ic.index_id,ic.type_desc,ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id)
from sys.indexes ic where object_id=object_id('testtable2');
|
Function used in query; The function should be placed within the database to be queried.
|
/****** Object: UserDefinedFunction [dbo].[fn_IndexColumns] Script Date: 09/26/2007 21:16:13 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[fn_IndexColumns] (@object_id int,@index_id int)
-- =============================================================
-- Function: dbo.fn_IndexColumns
-- Written by: (c)Colin Leversuch-Roberts
-- www.kelemconsulting.co.uk
--
-- Purpose: produce csv list of columns making up an index
--
-- System: Database to be queried ( sql 2005 only )
--
-- Input Paramters: @object_id int
-- @index_id int
--
-- Returns : csv within varchar(2000)
--
-- Usage: produces a list of index columns for a given object id and index id
-- select dbo.fn_IndexColumns(645577338,2)
-- select ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id) from sys.indexes ic
-- ( don't do this on a large database )
--
-- Calls sys.index_columns, sys.columns
--
-- Notes: inline functions are bad and I suppose I could have made this a tvf
-- however it can be used standalone or embedded and it's not like it is
-- a function to used often.
--
-- VERSION HISTORY
-- Version No Date Description
-- 1 23 August 2007 Initial Release
-- =============================================================
returns varchar(2000)
as
BEGIN
declare @columns varchar(2000)
select @columns = coalesce(@columns+',','') + sc.name from
sys.index_columns ic join sys.columns sc on sc.column_id=ic.column_id and sc.[object_id]=ic.[object_id]
where ic.[object_id] = @object_id and ic.index_id = @index_id
order by ic.key_ordinal
return(@columns)
END;
|
I’m not really out to show index tuning here, more to just show the importance of size when it comes to io performance, a crucial factor in tuning an oltp database.
|
--
-- default query uses secondary index ( index 2 )
--
select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2
where numkey between 100 and 2000;
|
(1901 row(s) affected)
Table 'Testtable2'. Scan count 1, logical reads 20, physical reads 0
|
--
-- now force PK
--
select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2 with(index=1)
where numkey between 100 and 2000;
|
(1901 row(s) affected)
Table 'Testtable2'. Scan count 1, logical reads 1343, physical reads 0
Note: As the clustered index is always the entire table the number of io to return the same data is always going to be much higher on a wide table; If you’ve specifically chosen the clsutered key to cover a query you may actually find a sceondary index to be much more efficient. I knew that for the query I defined the clustered index would not be used, which is why I had to use a hint to force the query.
The extra 1,323 page reads technically represent an extra 10.3Mb of data placed into the data cache. Whilst this might seem small in the overall scheme of things should this query be called several times a second then the difference becomes significant.
16. From the figures above, step 15, we can quickly calculate that for my target table the secondary indexes would add up to some 80Gb
In the main we also know that quite often the secondary index which covers the clustered index would actually be far more efficient.
So if we make the clustered primary key, non clustered e.g. the secondary index, index 2, and cluster on Numkey only we will have a similar physical sort order for the data ; obviously not fully but this is an example.
|
Index
id |
index_typec |
Index
depth |
Index
level |
Page
count |
Record
count |
avg_record
size_in
bytes |
|
1 |
CLUSTERED INDEX |
4 |
0 |
1000000 |
2000000 |
3527 |
|
1 |
CLUSTERED INDEX |
4 |
1 |
1608 |
1000000 |
11 |
|
1 |
CLUSTERED INDEX |
4 |
2 |
3 |
1608 |
11 |
|
1 |
CLUSTERED INDEX |
4 |
3 |
1 |
3 |
11 |
|
2 |
NONCLUSTERED INDEX |
4 |
0 |
15625 |
2000000 |
61 |
|
2 |
NONCLUSTERED INDEX |
4 |
1 |
134 |
15625 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
2 |
2 |
134 |
67 |
|
2 |
NONCLUSTERED INDEX |
4 |
3 |
1 |
2 |
67 |
|
3 |
NONCLUSTERED INDEX |
3 |
0 |
3711 |
2000000 |
13 |
|
3 |
NONCLUSTERED INDEX |
3 |
1 |
10 |
3711 |
19 |
|
3 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
10 |
19 |
|
4 |
NONCLUSTERED INDEX |
3 |
0 |
5682 |
2000000 |
21 |
|
4 |
NONCLUSTERED INDEX |
3 |
1 |
21 |
5682 |
27 |
|
4 |
NONCLUSTERED INDEX |
3 |
2 |
1 |
21 |
27 |
The covered index ( non clustered Primary Key index 2 ) remains the same size as before but the two secondary indexes, indexes 3 and 4 have reduced in size. For my target table the saving in secondary index space is some 42Gb, just over 50%. The smaller page count makes the indexes more efficient.
The smaller clustered key also appears to have reduced the storage size of the table, however it is possible that in generating the wider table I introduced some fragmentation.
Conclusions
Indexes on nullable columns have a 3 byte overhead per row/index row
Indexes on a HEAP gain a 9 byte rid per row/index row.
As expected constraints make no difference.
In a choice between an integer and a guid as a clustered key for a table unless there’s really a real reason to use a guid – don’t.
The width of the data objects making the clustered index or key are added to every row of a secondary index , + 1 byte; In other words the clustered index is physically appended to every secondary index.
Make clustered indexes narrow ; wide Primary Keys should be non clustered.
NOTE: One of my standard DBA interview questions is to ask if you can have a non clustered Primary Key, the answer is of course YES. I’d say at least 60% of the DBA’s I’ve asked this question got it wrong.
Appendix: Scripts
Create table Testable
|
create table dbo.Testtable
(
NumKey int identity(1,1),
cGuid uniqueidentifier default newid()null,
cSguid uniqueidentifier default NEWSEQUENTIALID() not null,
cBig bigint null,
cInt int null,
cDate1 datetime not null,
cDate2 datetime null,
cVchar1 varchar(50) null,
cChar1 char(20) not null,
cVchar2 varchar(500) null,
cChar2 char(100) null
);
go
|
Create Testtable2
|
create table dbo.Testtable2
(
NumKey int identity(1,1),
cGuid uniqueidentifier default newid()not null,
cSguid uniqueidentifier default NEWSEQUENTIALID() not null,
cBig bigint not null,
cInt int not null,
cDate1 datetime not null,
cDate2 datetime not null,
cVchar1 varchar(50) not null,
cChar1 char(20) not null,
cVchar2 varchar(500) not null,
cChar2 char(100) not null
);
go
|
Script used to populate tables
|
--
-- populate table
--
set nocount on
--
declare @num bigint,@count int,@count2 int,@count3 int
set @count = 1
set @count2 = 12
set @count3 = 1
while @count<2000001
begin
IF (@count%2)=0
begin
insert into dbo.Testtable2(cBig,cInt,cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)
values( 36854775807+(36854775807/@count),@count2*3,dateadd(dd,@count3,'25 MAY 2007'), dateadd(mi,@count,getdate()),
reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),
replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate()+@count)+datename(month,getdate()+@count))+'blahblahblahblahblahblahblahblahblahblahblahblah')
end
else
begin
insert into dbo.Testtable2(cBig,cInt, cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)
values( 854775807-(36854775807/@count),@count2*2,dateadd(dd,@count3,'5 dec 2006'), dateadd(mi,@count,getdate()),
reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),
replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate())+datename(month,getdate()))+'blahblahblahblahblahblahblahblahblahblahblahblah')
end
set @count = @count+1;
set @count2 = @count2+2;
IF @count2>50
set @count2 = 12;
IF (@count%3000)=0
set @count3 = @count3+1;
end;
|
Query to view index data ( change table name to suit )
|
select index_id,index_type_desc,index_depth,index_level,page_count,record_count, min_record_size_in_bytes,max_record_size_in_bytes,avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id(), object_id('testtable2'), null , null ,'detailed' );
|