Analysing Indexes - count *

In my presentations on indexing I have always said that you should explore the advantages of covering your clustered index with a secondary index. In circumstances where you might want to just return values form the PK ( assuming it's your clustered index ) a secondary index will be more efficient especially when the row size is wide. Any operation on a clustered index will always return the entire row, so select ID from dbo.mytable where ID is the clustered PK integer will return not just the ID value to cache but the entire row. Use a secondary index and it's just the column.

So what happens when you perform a count(*) on a table?  Well if you don't have any secondary indexes the result is a table scan, however cover your secondary index and the count(*) will use the secondary index. On big tables this is a massive saving in io.

In fact generally the optimiser will search for the most effective index for not only a select(*) but also a select ( columnname ) . Each secondary index has the size of the clustered index added to it, but a secondary index on a column within a clustered index does not get this overhead.



so even if you issue a select count(mycolumn2) from dbo.mytable where you've indexed an int column the optimiser will use a covered secondary index on an int clustered PK - very strange!

I suggest you create a suitable table and maybe populate it with a million rows, then create some indexes and run some count(*) to see what happens. 




Published Wednesday, June 2, 2010 3:51 PM by GrumpyOldDBA
Filed under:


No Comments