Analysing Indexes Part 4 - Size does matter

( can't post this in one go so it's going to be a few parts )

v      I’ll say now that this is a whole lot of figures and values, I’ve attempted to summarize what I can at the end of the article; for those who want to cut to the chase.

o        ( The naming used for these examples is not representative of any conventions or standards )

 

v      As part of my analysis of indexes I was particularly interested in the size of indexes, the database I am working with has many tables of several hundred million rows, these tables are often wide with clustered primary keys of up to 6 columns and twenty or more secondary indexes.

 

v      I’m not interested in the theory of indexing here or making any ground breaking revelations, I’m just observing that an awareness of how the index is formed may help us understand why the optimiser may sometimes seem to ignore what we may think is a perfectly good index

 

v      My initial interest in this subject was aroused when I noticed I had an index of 8gb for a single datetime column on a 10gb table which had a considerable number of columns.

 

1.   For my tests I created two tables and populated them with two million rows each.

Why two tables? Well I wanted to see what difference a nullable column might have.

Whilst I would have liked to create much larger tables the reality is that I needed to run repeated calls to sys.dm_db_index_physical_stats and I know from experience that these calls can take a long time to run on a four hundred million row table.

 

Results for the initial table creation – the nullable table returns the exact same size results as my non nullable table., using sp_spaceused and sys.dm_db_index_physical_stats

 

name

rows

reserved

data

index_size

unused

 

Testtable2

2000000

1333576 KB

1333392 KB

8 KB

176 KB

 

with nulls

 

 

 

 

 

 

index_id

index_type_desc

index_depth

index_level

page_count

record_count

avg_record

size_in_bytes

0

HEAP

1

0

166674

2000000

577

 

 

 

 

 

 

 

name

rows

reserved

data

index_size

unused

 

TestTable

2000000

1333576 KB

1333392 KB

8 KB

176 KB

 

 no nulls

 

 

 

 

 

 

index_id

index_type_desc

index_depth

index_level

page_count

record_count

avg_record

size_in_bytes

0

HEAP

1

0

166674

2000000

577

 

For all my results I will scale up to four hundred million rows, I’ll refer to this as my “target table”

 

2.   Leaving the tables as HEAPs I created a non clustered secondary index on a datetime column, what was noticeable was that the index on the nullable column showed an average record size 3 bytes larger, this might not sound much but on my target table of four hundred million rows that equates to 1.1gb.

 

Testtable2 ( no nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

166674

2000000

577

2

NONCLUSTERED INDEX

3

0

4695

2000000

17

2

NONCLUSTERED INDEX

3

1

15

4695

23

2

NONCLUSTERED INDEX

3

2

1

15

23

Testtable ( nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

0

HEAP

1

0

166674

2000000

577

2

NONCLUSTERED INDEX

3

0

5435

2000000

20

2

NONCLUSTERED INDEX

3

1

19

5435

26

2

NONCLUSTERED INDEX

3

2

1

19

26

 

So what is all the fuss about with index size? Well indexes make pages and as far as the optimiser is concerned pages make prizes , ok not actually, but the number of pages used is an integral part of the cost of a query so the fewer the pages the better.

In my two million row test table there is a difference of 744 pages, 15%, 5.8Mb or 1.1Gb for my target table.

 

We know that when there is no clustered index a rid is added to the index, so how big is a rid ? Seems like 9 bytes, so our 8 byte datetime actually widens to 17 bytes and 20 bytes where the column is nullable.

 

3.  Next step was to create a clustered primary key on an integer column.

 

Testtable2 ( no nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

3

0

153847

2000000

577

1

CLUSTERED INDEX

3

1

248

153847

11

1

CLUSTERED INDEX

3

2

1

248

11

2

NONCLUSTERED INDEX

3

0

3711

2000000

13

2

NONCLUSTERED INDEX

3

1

10

3711

19

2

NONCLUSTERED INDEX

3

2

1

10

19

 

Testtable ( nulls )

 

Index

id

index_typec

Index

depth

Index

level

Page

count

Record

count

avg_record

size_in

bytes

1

CLUSTERED INDEX

3

0

153847

2000000

577

1

CLUSTERED INDEX

3

1

248

153847

11

1

CLUSTERED INDEX

3

2

1

248

11

2

NONCLUSTERED INDEX

3

0

4455

2000000

16

2

NONCLUSTERED INDEX

3

1

14

4455

22

2

NONCLUSTERED INDEX

3

2

1

14

22

 

  • The secondary index reduces in size by 4 bytes each entry, that will save 1.5gb on my target table
  • The addition now is presumably 4 bytes for the int plus an extra byte; maybe.
  • The extra storage required for the clustered index is 1.6mb for the test table, 320mb for my target table.
  • This means adding a clustered index to the heap has reduced overall storage by approx 1.5gb for the secondary index on my target table.
  • The conclusion here is that indexed HEAPs are less efficient and take more storage than the same table with a clustered index.
  • Perhaps also note that the size of the actual table has also reduced by some 12,578 pages ( 98Mb ), that equates to 19.2GB for my target table
  • If we add the saving for the secondary index this means we have saved 20.7Gb on this table.

 

NOTE1: It is very important to consider the number of pages that an index consumes when looking at efficiency,  for my target table the secondary index requires 197,800 LESS pages with the clustered index. ( 744,400 vs 942,200 -  the index on the HEAP is 27% larger )

 

NOTE2: For the remainder of the analysis I am concentrating on the table without nullable columns.

 

NOTE3: DBA Question: “ How can you remove fragmentation from a HEAP?”

            Answer:  “ Add a clustered index and then remove the clustered index “

            Should I ever interview you for a job as a DBA this is one of the questions I may ask – there are at least two other answers.

 

NOTE4: There may have been some fragmentation created during the population of the table; however, technically populating a heap shouldn’t cause fragmentation, or maybe it does?

 

Published 27 September 2007 08:12 by GrumpyOldDBA
Filed under: ,

Comments

No Comments