September 2007 - Posts

Analysing Indexes Part 4 - Size does matter - 3

11. dropping the clustered index shows all secondary indexes at + 9 bytes

( integer index replaced with guid index )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

153847

2000000

579

datetime 2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

Guid  3

NONCLUSTERED INDEX

3

0

6689

2000000

25

3

NONCLUSTERED INDEX

3

1

28

6689

31

3

NONCLUSTERED INDEX

3

2

1

28

31

char(20) 4

NONCLUSTERED INDEX

3

0

7663

2000000

29

4

NONCLUSTERED INDEX

3

1

36

7663

35

4

NONCLUSTERED INDEX

3

2

1

36

35

 

 

12. Compound clustered key; Integer+guid

secondary keys take int + guid + 1byte =  + 21 bytes per entry

As expected from deduction in step 10

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

552

153847

27

1

CLUSTERED INDEX

4

2

2

552

27

1

CLUSTERED INDEX

4

3

1

2

27

2

NONCLUSTERED INDEX

3

0

7663

2000000

29

2

NONCLUSTERED INDEX

3

1

36

7663

35

2

NONCLUSTERED INDEX

3

2

1

36

35

3

NONCLUSTERED INDEX

3

0

9662

2000000

37

3

NONCLUSTERED INDEX

3

1

54

9662

43

3

NONCLUSTERED INDEX

3

2

1

54

43

4

NONCLUSTERED INDEX

3

0

10639

2000000

41

4

NONCLUSTERED INDEX

3

1

65

10639

47

4

NONCLUSTERED INDEX

3

2

1

65

47

 

Index 2

datetime

Index 3

guid

Index 4

char(20)

 

 

 

13. really wide clustered indexes

It’s not unusual to find wide clustered indexes, the application being tuned which started this series of posts has some tables with more than 7 columns making the key. The tables contain, generally over 100 or more columns.

This isn’t actually a really wide index, it’s only 60 bytes wide, and my table is quite narrow at 577 bytes, however it’s useful when comparing to step 15 where I’ve widened the table a bit.

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

1315

153847

67

1

CLUSTERED INDEX

4

2

12

1315

67

1

CLUSTERED INDEX

4

3

1

12

67

 

 

14. Now lets create a secondary index which covers the PK + one additional datetime column

 

 

alter table dbo.testtable2 add constraint PK_TestTable2 primary key clustered  (NumKey,cSguid,cBig,cInt,cDate2,cChar1);

go  --index 1

 

create index testtable2_idx_test1 on dbo.testtable2(NumKey,cSguid,cBig,cInt,cDate2,cChar1,cDate1); --index 2

 

create index testtable2_idx_test2 on dbo.testtable2(cDate1); -- index 3

 

create index testtable2_idx_test3 on dbo.testtable2(cDate1,cDate2); -- index 4

 

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

1315

153847

67

1

CLUSTERED INDEX

4

2

12

1315

67

1

CLUSTERED INDEX

4

3

1

12

67

2

NONCLUSTERED INDEX

4

0

17544

2000000

69

2

NONCLUSTERED INDEX

4

1

168

17544

75

2

NONCLUSTERED INDEX

4

2

2

168

75

2

NONCLUSTERED INDEX

4

3

1

2

75

3

NONCLUSTERED INDEX

4

0

17544

2000000

69

3

NONCLUSTERED INDEX

4

1

168

17544

75

3

NONCLUSTERED INDEX

4

2

2

168

75

3

NONCLUSTERED INDEX

4

3

1

2

75

4

NONCLUSTERED INDEX

4

0

17544

2000000

69

4

NONCLUSTERED INDEX

4

1

168

17544

75

4

NONCLUSTERED INDEX

4

2

2

168

75

4

NONCLUSTERED INDEX

4

3

1

2

75

 

This raises interesting aspects of index design when secondary indexes may include columns from the PK

All three secondary indexes are identical in size despite the different definitions.

 

15. here the width of the table has been increased, the number of pages for the clustered index is always the table size. Compare that to the size of the secondary index 2 which covers the clustered index.

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

5

0

1384615

2000000

3527

1

CLUSTERED INDEX

5

1

21038

1384615

67

1

CLUSTERED INDEX

5

2

337

21038

67

1

CLUSTERED INDEX

5

3

5

337

67

1

CLUSTERED INDEX

5

4

1

5

67

2

NONCLUSTERED INDEX

4

0

15625

2000000

61

2

NONCLUSTERED INDEX

4

1

134

15625

67

2

NONCLUSTERED INDEX

4

2

2

134

67

2

NONCLUSTERED INDEX

4

3

1

2

67

3

NONCLUSTERED INDEX

4

0

17544

2000000

69

3

NONCLUSTERED INDEX

4

1

168

17544

75

3

NONCLUSTERED INDEX

4

2

2

168

75

3

NONCLUSTERED INDEX

4

3

1

2

75

4

NONCLUSTERED INDEX

4

0

17544

2000000

69

4

NONCLUSTERED INDEX

4

1

168

17544

75

4

NONCLUSTERED INDEX

4

2

2

168

75

4

NONCLUSTERED INDEX

4

3

1

2

75

 

Indexes defined on table

 

Index ID

Index Type

Index Name

Index Columns

1

CLUSTERED

PK_TestTable2

NumKey,cSguid,cBig,cInt,cDate2,cChar1

2

NONCLUSTERED

testtable2_idx_test1

NumKey,cSguid,cBig,cInt,cDate2,cChar1

3

NONCLUSTERED

testtable2_idx_test2

cDate1

4

NONCLUSTERED

testtable2_idx_test3

cDate1,cDate2

 

 

Script to list index column names ( results as above )

 

--

-- list index columns for table

--

select ic.index_id,ic.type_desc,ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id)

from sys.indexes ic where object_id=object_id('testtable2');

 

 

Function used in query; The function should be placed within the database to be queried.

 

 

/****** Object:  UserDefinedFunction [dbo].[fn_IndexColumns]    Script Date: 09/26/2007 21:16:13 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

create function [dbo].[fn_IndexColumns] (@object_id int,@index_id int)

-- ============================================================= 

--  Function:                 dbo.fn_IndexColumns                         

--  Written by:            (c)Colin Leversuch-Roberts

--                                www.kelemconsulting.co.uk                               

--                                                                

--  Purpose:                  produce csv list of columns making up an index

--                                                               

--  System:                   Database to be queried ( sql 2005 only )

--

--  Input Paramters:     @object_id int

--                                                                      @index_id int

--                                 

--  Returns :                 csv within varchar(2000)                                   

--                                                               

--  Usage:                   produces a list of index columns for a given object id and index id

--                                              select dbo.fn_IndexColumns(645577338,2)

--                                              select ic.name,dbo.fn_IndexColumns(ic.object_id,ic.index_id) from sys.indexes ic

--                                              ( don't do this on a large database )

--

-- Calls             sys.index_columns, sys.columns                                             

--                                                                

--  Notes:                     inline functions are bad and I suppose I could have made this a tvf

--                                              however it can be used standalone or embedded and it's not like it is

--                                              a function to used often.

--                                                                

--  VERSION HISTORY

--  Version No               Date                             Description

--  1                              23 August 2007           Initial Release                             

-- ============================================================= 

returns varchar(2000)

as

BEGIN

declare @columns varchar(2000)

select @columns = coalesce(@columns+',','') + sc.name from

sys.index_columns ic join sys.columns sc on sc.column_id=ic.column_id and sc.[object_id]=ic.[object_id]

where ic.[object_id] = @object_id and ic.index_id = @index_id

order by ic.key_ordinal

return(@columns)

END;

 

 

I’m not really out to show index tuning here, more to just show the importance of size when it comes to io performance, a crucial factor in tuning an oltp database.

 

 

--

-- default query uses secondary index ( index 2 )

--

select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2

where numkey between 100 and 2000;

 

 

(1901 row(s) affected)

Table 'Testtable2'. Scan count 1, logical reads 20, physical reads 0

 

 

--

-- now force PK

--

select NumKey,cSguid,cBig,cInt,cDate2,cChar1 from dbo.testtable2 with(index=1)

where numkey between 100 and 2000;

 

 

(1901 row(s) affected)

Table 'Testtable2'. Scan count 1, logical reads 1343, physical reads 0

 

Note: As the clustered index is always the entire table the number of io to return the same data is always going to be much higher on a wide table; If you’ve specifically chosen the clsutered key to cover a query you may actually find a sceondary index to be much more efficient. I knew that for the query I defined the clustered index would not be used, which is why I had to use a hint to force the query.

 

The extra 1,323 page reads technically represent an extra 10.3Mb of data placed into the data cache. Whilst this might seem small in the overall scheme of things should this query be called several times a second then the difference becomes significant.

 

16. From the figures above, step 15, we can quickly calculate that for my target table the secondary indexes would add up to some 80Gb

In the main we also know that quite often the secondary index which covers the clustered index would actually be far more efficient.

So if we make the clustered primary key, non clustered e.g. the secondary index, index 2, and cluster on Numkey only we will have a similar physical sort order for the data ; obviously not fully but this is an example.

 

 

 

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

1000000

2000000

3527

1

CLUSTERED INDEX

4

1

1608

1000000

11

1

CLUSTERED INDEX

4

2

3

1608

11

1

CLUSTERED INDEX

4

3

1

3

11

2

NONCLUSTERED INDEX

4

0

15625

2000000

61

2

NONCLUSTERED INDEX

4

1

134

15625

67

2

NONCLUSTERED INDEX

4

2

2

134

67

2

NONCLUSTERED INDEX

4

3

1

2

67

3

NONCLUSTERED INDEX

3

0

3711

2000000

13

3

NONCLUSTERED INDEX

3

1

10

3711

19

3

NONCLUSTERED INDEX

3

2

1

10

19

4

NONCLUSTERED INDEX

3

0

5682

2000000

21

4

NONCLUSTERED INDEX

3

1

21

5682

27

4

NONCLUSTERED INDEX

3

2

1

21

27

 

The covered index ( non clustered Primary Key index 2 ) remains the same size as before but the two secondary indexes, indexes 3 and 4 have reduced in size. For my target table the saving in secondary index space is some 42Gb, just over 50%. The smaller page count makes the indexes more efficient.

The smaller clustered key also appears to have reduced the storage size of the table, however it is possible that in generating the wider table I introduced some fragmentation.

 

Conclusions

 

Indexes on nullable columns have a 3 byte overhead per row/index row

Indexes on a HEAP gain a 9 byte rid per row/index row.

As expected constraints make no difference.

In a choice between an integer and a guid as a clustered key for a table unless there’s really a real reason to use a guid – don’t.

The width of the data objects making the clustered index or key are added to every row of a secondary index , + 1 byte; In other words the clustered index is physically appended to every secondary index.

Make clustered indexes narrow ; wide Primary Keys should be non clustered.

 

NOTE:  One of my standard DBA interview questions is to ask if you can have a non clustered Primary Key, the answer is of course YES. I’d say at least 60% of the DBA’s I’ve asked this question got it wrong.

 

 

Appendix: Scripts

 

Create table Testable

 

create table dbo.Testtable

(

NumKey int identity(1,1),

cGuid uniqueidentifier default newid()null,

cSguid uniqueidentifier default NEWSEQUENTIALID() not null,

cBig bigint null,

cInt int null,

cDate1 datetime not null,

cDate2 datetime null,

cVchar1 varchar(50) null,

cChar1 char(20) not null,

cVchar2 varchar(500) null,

cChar2 char(100) null

);

go

 

 

Create Testtable2

 

create table dbo.Testtable2

(

NumKey int identity(1,1),

cGuid uniqueidentifier default newid()not null,

cSguid uniqueidentifier default NEWSEQUENTIALID() not null,

cBig bigint not null,

cInt int not null,

cDate1 datetime not null,

cDate2 datetime not null,

cVchar1 varchar(50) not null,

cChar1 char(20) not null,

cVchar2 varchar(500) not null,

cChar2 char(100) not null

);

go

 

 

Script used to populate tables

 

--

-- populate table

--

set nocount on

--

declare @num bigint,@count int,@count2 int,@count3 int

set @count = 1

set @count2 = 12

set @count3 = 1

while @count<2000001

begin

IF (@count%2)=0

            begin

                        insert into dbo.Testtable2(cBig,cInt,cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)

                        values( 36854775807+(36854775807/@count),@count2*3,dateadd(dd,@count3,'25 MAY 2007'), dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate()+@count)+datename(month,getdate()+@count))+'blahblahblahblahblahblahblahblahblahblahblahblah')

            end

else

            begin

                        insert into dbo.Testtable2(cBig,cInt, cDate1,cDate2,cVchar1,cChar1,cVchar2,cChar2)

                        values( 854775807-(36854775807/@count),@count2*2,dateadd(dd,@count3,'5 dec 2006'), dateadd(mi,@count,getdate()),

                        reverse(convert(varchar(50),dateadd(dd,@count3,'25 MAY 2007'))), '0000'+convert(varchar(7),@count)+convert(varchar(4),@count3),

                        replicate(convert(char(36),newid()),10),convert(varchar(20), datename(dw,getdate())+datename(month,getdate()))+'blahblahblahblahblahblahblahblahblahblahblahblah')

            end

set @count = @count+1;

set @count2 = @count2+2;

IF @count2>50

            set @count2 = 12;

IF (@count%3000)=0

            set @count3 = @count3+1;

end;

 

 

Query to view index data ( change table name to suit )

 

 

select index_id,index_type_desc,index_depth,index_level,page_count,record_count, min_record_size_in_bytes,max_record_size_in_bytes,avg_record_size_in_bytes

            from sys.dm_db_index_physical_stats (db_id(), object_id('testtable2'), null , null ,'detailed' );

 

 

 

Posted by GrumpyOldDBA with 1 comment(s)
Filed under: ,

Analysing Indexes Part 4 - Size does matter -2

4. Replacing the  clustered Primary Key with a non clustered Primary Key on the integer column gives us two secondary indexes, each entry being + 9 bytes

The primary key is index 3

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

153847

2000000

577

2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

3

NONCLUSTERED INDEX

3

0

3711

2000000

13

3

NONCLUSTERED INDEX

3

1

6

3711

11

3

NONCLUSTERED INDEX

3

2

1

6

11

 

5. A non clustered unique constraint is the same ( index 3 again )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

153847

2000000

577

2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

3

NONCLUSTERED INDEX

3

0

3711

2000000

13

3

NONCLUSTERED INDEX

3

1

6

3711

11

3

NONCLUSTERED INDEX

3

2

1

6

11

 

6. A clustered unique constraint is the same as a clustered Primary Key

The results are the same as step 3

( Index 2 is the same index as created in step 2 )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

3

0

153847

2000000

577

1

CLUSTERED INDEX

3

1

248

153847

11

1

CLUSTERED INDEX

3

2

1

248

11

2

NONCLUSTERED INDEX

3

0

3711

2000000

13

2

NONCLUSTERED INDEX

3

1

10

3711

19

2

NONCLUSTERED INDEX

3

2

1

10

19


7. Change the Clustered primary Key to a guid from an integer. Secondary index on datetime remains. ( Note this is a sequential guid )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

477

153847

23

1

CLUSTERED INDEX

4

2

2

477

23

1

CLUSTERED INDEX

4

3

1

2

23

2

NONCLUSTERED INDEX

3

0

6689

2000000

25

2

NONCLUSTERED INDEX

3

1

28

6689

31

2

NONCLUSTERED INDEX

3

2

1

28

31

 

  • Our key depth is now 23 bytes, but the secondary index is now 8 + 17 bytes per entry, for my target table changing the clustered key from an integer to a guid  increases the size of my secondary index by 4.6Gb
  • Again the increase in pages used by the secondary index is significant,  599,200

·         Personally I dislike unique identifiers, in most cases there is no reason for their use over an integer column – here’s another reason to dislike them as your choice of surrogate key or row identifier.

·         Index 2 is still my original datetime index

 

 

8. As for 7. but the datetime column is now nullable; note that the width per key entry for the 8 byte datetime is now 28 bytes in the leaf and 34 bytes in the tree.

For our target table the index for 3gb of datetime data ( all values populated ) is actually 11.4gb, nearly 4 times the size of the data.

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

577

1

CLUSTERED INDEX

4

1

477

153847

23

1

CLUSTERED INDEX

4

2

2

477

23

1

CLUSTERED INDEX

4

3

1

2

23

2

NONCLUSTERED INDEX

3

0

7435

2000000

28

2

NONCLUSTERED INDEX

3

1

34

7435

34

2

NONCLUSTERED INDEX

3

2

1

34

34

 

 

9. This just shows a clustered index with a non clustered PK integer and the datetime.

Both secondary indexes gain +17 bytes per entry; The only interest is that the integer tree levels only gain 7 bytes.

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

4

0

153847

2000000

579

1

CLUSTERED INDEX

4

1

477

153847

23

1

CLUSTERED INDEX

4

2

2

477

23

1

CLUSTERED INDEX

4

3

1

2

23

2

NONCLUSTERED INDEX

3

0

6689

2000000

25

2

NONCLUSTERED INDEX

3

1

28

6689

31

2

NONCLUSTERED INDEX

3

2

1

28

31

3

NONCLUSTERED INDEX

3

0

5682

2000000

21

3

NONCLUSTERED INDEX

3

1

10

5682

11

3

NONCLUSTERED INDEX

3

2

1

10

11

 

Index 2 is the original datetime, Index 3 the non clustered integer Primary Key

 

10. Adding an index on a char(20) gains the same 17 bytes

This establishes that for a non null column the addition to each row in the index is the width of the clustered key plus 1 byte. If the column for the secondary index is nullable each row has an additional 3 bytes added.

Sort of scary to think an index on a nullable tinyint ( 1 byte ) would be 21 bytes wide for each row.

 

Posted by GrumpyOldDBA with no comments
Filed under: ,

Analysing Indexes Part 4 - Size does matter

( can't post this in one go so it's going to be a few parts )

v      I’ll say now that this is a whole lot of figures and values, I’ve attempted to summarize what I can at the end of the article; for those who want to cut to the chase.

o        ( The naming used for these examples is not representative of any conventions or standards )

 

v      As part of my analysis of indexes I was particularly interested in the size of indexes, the database I am working with has many tables of several hundred million rows, these tables are often wide with clustered primary keys of up to 6 columns and twenty or more secondary indexes.

 

v      I’m not interested in the theory of indexing here or making any ground breaking revelations, I’m just observing that an awareness of how the index is formed may help us understand why the optimiser may sometimes seem to ignore what we may think is a perfectly good index

 

v      My initial interest in this subject was aroused when I noticed I had an index of 8gb for a single datetime column on a 10gb table which had a considerable number of columns.

 

1.   For my tests I created two tables and populated them with two million rows each.

Why two tables? Well I wanted to see what difference a nullable column might have.

Whilst I would have liked to create much larger tables the reality is that I needed to run repeated calls to sys.dm_db_index_physical_stats and I know from experience that these calls can take a long time to run on a four hundred million row table.

 

Results for the initial table creation – the nullable table returns the exact same size results as my non nullable table., using sp_spaceused and sys.dm_db_index_physical_stats

 

name

rows

reserved

data

index_size

unused

 

Testtable2

2000000

1333576 KB

1333392 KB

8 KB

176 KB

 

with nulls

 

 

 

 

 

 

index_id

index_type_desc

index_depth

index_level

page_count

record_count

avg_record

size_in_bytes

0

HEAP

1

0

166674

2000000

577

 

 

 

 

 

 

 

name

rows

reserved

data

index_size

unused

 

TestTable

2000000

1333576 KB

1333392 KB

8 KB

176 KB

 

 no nulls

 

 

 

 

 

 

index_id

index_type_desc

index_depth

index_level

page_count

record_count

avg_record

size_in_bytes

0

HEAP

1

0

166674

2000000

577

 

For all my results I will scale up to four hundred million rows, I’ll refer to this as my “target table”

 

2.   Leaving the tables as HEAPs I created a non clustered secondary index on a datetime column, what was noticeable was that the index on the nullable column showed an average record size 3 bytes larger, this might not sound much but on my target table of four hundred million rows that equates to 1.1gb.

 

Testtable2 ( no nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

166674

2000000

577

2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

Testtable ( nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

166674

2000000

577

2

NONCLUSTERED INDEX

3

0

5435

2000000

20

2

NONCLUSTERED INDEX

3

1

19

5435

26

2

NONCLUSTERED INDEX

3

2

1

19

26

 

So what is all the fuss about with index size? Well indexes make pages and as far as the optimiser is concerned pages make prizes , ok not actually, but the number of pages used is an integral part of the cost of a query so the fewer the pages the better.

In my two million row test table there is a difference of 744 pages, 15%, 5.8Mb or 1.1Gb for my target table.

 

We know that when there is no clustered index a rid is added to the index, so how big is a rid ? Seems like 9 bytes, so our 8 byte datetime actually widens to 17 bytes and 20 bytes where the column is nullable.

 

3.  Next step was to create a clustered primary key on an integer column.

 

Testtable2 ( no nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

3

0

153847

2000000

577

1

CLUSTERED INDEX

3

1

248

153847

11

1

CLUSTERED INDEX

3

2

1

248

11

2

NONCLUSTERED INDEX

3

0

3711

2000000

13

2

NONCLUSTERED INDEX

3

1

10

3711

19

2

NONCLUSTERED INDEX

3

2

1

10

19

 

Testtable ( nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

3

0

153847

2000000

577

1

CLUSTERED INDEX

3

1

248

153847

11

1

CLUSTERED INDEX

3

2

1

248

11

2

NONCLUSTERED INDEX

3

0

4455

2000000

16

2

NONCLUSTERED INDEX

3

1

14

4455

22

2

NONCLUSTERED INDEX

3

2

1

14

22

 

  • The secondary index reduces in size by 4 bytes each entry, that will save 1.5gb on my target table
  • The addition now is presumably 4 bytes for the int plus an extra byte; maybe.
  • The extra storage required for the clustered index is 1.6mb for the test table, 320mb for my target table.
  • This means adding a clustered index to the heap has reduced overall storage by approx 1.5gb for the secondary index on my target table.
  • The conclusion here is that indexed HEAPs are less efficient and take more storage than the same table with a clustered index.
  • Perhaps also note that the size of the actual table has also reduced by some 12,578 pages ( 98Mb ), that equates to 19.2GB for my target table
  • If we add the saving for the secondary index this means we have saved 20.7Gb on this table.

 

NOTE1: It is very important to consider the number of pages that an index consumes when looking at efficiency,  for my target table the secondary index requires 197,800 LESS pages with the clustered index. ( 744,400 vs 942,200 -  the index on the HEAP is 27% larger )

 

NOTE2: For the remainder of the analysis I am concentrating on the table without nullable columns.

 

NOTE3: DBA Question: “ How can you remove fragmentation from a HEAP?”

            Answer:  “ Add a clustered index and then remove the clustered index “

            Should I ever interview you for a job as a DBA this is one of the questions I may ask – there are at least two other answers.

 

NOTE4: There may have been some fragmentation created during the population of the table; however, technically populating a heap shouldn’t cause fragmentation, or maybe it does?

 

Posted by GrumpyOldDBA with no comments
Filed under: ,

An update on the TokenAndPermUserStore “problem”

http://sqlblogcasts.com/blogs/grumpyolddba/archive/2007/07/19/more-tokenandpermuserstore.aspx

 

As the original incident had been raised on the back of an application service pack upgrade I made use of one of my msdn incidents.

I have patch 3179 applied and I can see that the cache size fluctuates over the day, however I am actually clearing it out twice a day and the entire procedure cache once a day.

 

I expect a number of eyebrows will be raised at carrying out this type of command on a production server, however, it’s a long story and we were clearing the cache every day as part of the process of applying plan guides to resolve performance problems.

 

The application doesn’t seem to work as well on SQL 2005 as it did on SQL 2000, the application users ( via their forums and blogs ) say that there’s a bug in SQL 2005 which messes up the application. My view is that the application does not have effective indexing which makes diagnosis of unappropriate use of cached plans tricky. The application doesn’t use stored procedures, makes very heavy use of client side cursors and appears to give users open season on query writing.

 

Amongst the solutions mooted under our incident was the use of plan guides to force recompiles to make sure there wasn’t a mismatch of cached plans to queries.

The application also generates vast amount of ad-hoc queries which look as if they should be parameterised but in actual fact  are not  – I’m advised that turning on forced parameterisation is probably a bad idea.

 

Well it seems that recompiles and non parameterised ad-hoc queries are a cause of the TokenAndPermUserStore problem, as well as multi-page allocations ( mem to leave if I remember correctly for sql 2000 )  This is something I’ve seen before with similar applications.

 

So it seems the appl