Do you need to index very small table? - Simon Sabin UK SQL Consultant's Blog

Do you need to index very small table?

I’ve often heard people say that there is no point indexing small tables because all the data is on a small number of pages. I thought I would look into whether this was actually the case.

This came to my mind as I am writing a white paper on query plans, and one of the areas that things go wrong is where a nested loop results in lots of lookups being made to a table. I’ve been told that the engine doesn’t have to process the whole 8KB of a page when it does a lookup if it can read a small portion. So the question for me was how much actually gets read.

So I devised a test.  A heap and a clustered table populated with unique integers to fill the page.  If I then try and retrieve the last row inserted, for the heap it needs to scan the page but for the clustered index it can use the slot array to find the position of the row.

So the question is. What will the difference be? Note we are only using one page and so the clustered index doesn’t have any levels. This is purely testing how expensive it is to lookup rows in a page.

The script below creates the single page tables we are going to test against.

Note the number  592. This is the maximum number of rows I could get on a heap with a single integer column. This does leave a lot of space on the page, but not sure why I can’t have more

use tempdb

go

drop table heap

drop table indexedTable

go

create table heap (col1 int not null)

create table indexedTable (col1 int not null primary key)

go

declare @max int

set @max = 592 --max for a heap //621 --for clustered index

set nocount on

declare @i int

set @i = 0

while @i < @max

begin

insert into heap values (@i)

insert into indexedTable values (@i)

set @i = @i + 1

end

declare @d int

dbcc ind(tempdb,heap,1)

dbcc ind(tempdb,indexedTable,1)

/* You can use this to look at a specific page, you will need to change 2420 with a value from the previous results.

dbcc traceon (3604)

dbcc page (tempdb,1,2420,3)

*/

The following is the test, First I test the overhead of doing the loop and then I test against the heap and then the clustered table.

declare @max int , @i int, @dummy int, @CPUAtStart int

set @max = 592

set @CPUAtStart = (select cpu_time from sys.dm_exec_requests where session_id = @@spid)

set @i = 0

while @i < 100000

  begin

  set @dummy = 1

  set @i = @i +  1

  end

select 'baseline', cpu_time-@CPUAtStart from sys.dm_exec_requests where session_id = @@spid

set @CPUAtStart = (select cpu_time from sys.dm_exec_requests where session_id = @@spid)

set @i = 0

while @i < 100000

  begin

  set @dummy = (select top 1 col1

              from heap where col1 = @max)

  set @i = @i +   1

  end

select 'heap', cpu_time-@CPUAtStart from sys.dm_exec_requests where session_id = @@spid

set @CPUAtStart = (select cpu_time from sys.dm_exec_requests where session_id = @@spid)

set @i = 0

while @i < 100000

  begin

  set @dummy = (select top 1 col1

              from indexedTable where col1 = @max)

  set @i = @i +  1

  end

select 'indexed Table', cpu_time-@CPUAtStart from sys.dm_exec_requests where session_id = @@spid

For me the results were not what I imagined.

Test          CPU Usage

--------      -----------

baseline      32

heap          4945

indexed Table 920

This indicates the heap is 5 times slower gosh. Although understandable, it has to dissect each row on that page to find the relevant row.

So I thought I will test this for the first row. This surely can’t be as bad. All it has to do is read the first slot on the page

declare @max int , @i int, @dummy int, @CPUAtStart int

set @max = 592

set @CPUAtStart = (select cpu_time from sys.dm_exec_requests where session_id = @@spid)

set @i = 0

while @i < 100000

begin

  set @dummy = (select top 1col1

              from heap where col1 = 1)

  set @i = @i +   1

  end

select 'heap', cpu_time-@CPUAtStart from sys.dm_exec_requests where session_id = @@spid

set @CPUAtStart = (select cpu_time from sys.dm_exec_requests where session_id = @@spid)

set @i = 0

while @i < 100000

begin

 

  set @dummy = (select top 1 col1

              from indexedTable where col1 = 1)

  set @i = @i +  1

  end

select 'indexed Table', cpu_time-@CPUAtStart from sys.dm_exec_requests where session_id = @@spid

And looking at the results

Test          CPU time

----          -----------

heap          1404

indexed Table 967

We can see that the heap is still slower but not as bad. I had originally done this test without the TOP 1 and this performed as it did in the first test. This can be explained because the engine has to read EVERY row to see if it is the correct value. Unlike the on the clustered index table where the slot array points the engine direct at the right rows.

So in summary, having an index on small tables is useful. I’ve not looked into how this impacts the nested loop scenario yet and whether the engine has an intermediary cache of results, I can imagine it does but probably only for hash joins.

So get indexing your small tables.

 


-
Published 06 August 2009 00:43 by simonsabin

Comments

06 August 2009 22:08 by SqlServerKudos

# Do you need to index very small table?

Kudos for a great Sql Server article - Trackback from SqlServerKudos

# Weekly Link Recap for August 7 | Brent Ozar - SQL Server DBA

Pingback from  Weekly Link Recap for August 7 | Brent Ozar - SQL Server DBA

# Find Out Tables With No Primary Key/Index ??? SQL Server 2005 &laquo; Tech Scribbles

Pingback from  Find Out Tables With No Primary Key/Index ??? SQL Server 2005 &laquo;  Tech Scribbles