Can I Index a bit ? ( Part 1 )

The final part wil follow and a fully illustrated pdf will be on my website - I'll post the location when it's complete.

 

Can I index a bit?

VBUG  Presentation 27th January 2009 Bracknell.

 

Here's a brief summary of subject matter and the points raised.

 

Q.   Is indexing a black art: will zen help: will the queries work better with a full moon?

A.   Yes!

 

The presentation intends to show practical examples of how things are rather than how things should be.

The reality is that few databases match classroom best practices, this presentation is  about what actually works ( sometimes ) based upon real world  experience,  as to will these examples always be true, well,  to coin an oft used phrase  " It just depends "

 

NB. I struggle to sometimes make large blog posts which include pictures and code samples, I will illustrate and code a pdf version of this on my website www.grumpyolddba.co.uk

 

Bad Table:

  • This was the introduction where I showed a table with a wide clustered primary key and all other columns allowing nulls. I asked what, as a DBA, I might not like about this table, now read on.

 

Wide clustered indexes:

  • Bad because the width of the clustered index is added to every row of every secondary index.
  • If you need a wide Primary Key then make it non clustered and cluster on an integer if possible that makes up your PK.
  • For that reason GUIDs make bad clustered keys as they are 16 bytes wide compared to an int at 4 bytes.
  • If you must use a GUID then choose sequential guids as this will reduce table fragmentation.
  • e.g. You may calculate that a secondary index defined on an integer column would fit approximately 8,000/4 rows per page, 2,000 rows per page.
  • o However, if your clustered key consists a guid then each entry for each row for that index is actually 16+4+1 bytes, this actually gives approx 381 rows per page
  • o For an integer clustered key the capacity would be approx 890 rows per page. In real terms this is 11.8Mb difference for an integer index on a 1 million row table, and that's for each secondary index.
  • o This also equates to buffer storage for secondary indexes and page reads required for seeks and scans.

 

The Truth about clustered indexes:

  • Traditionally the clustered index has been marked for range scans, typically dates and so on.
  • What is often overlooked is that the clustered index is the entire table.
  • Showing a select of a range of integer values from an integer clustered primary key showed that despite the result set only being an integer the entire row for each value was placed in buffer cache.
  • o ( shown by the number of page reads )
  • The solution was to create a non clustered index which duplicated the clustered index, the same query now used the secondary index and in the example shown reduced page reads from 175 to 5.
  • The point being made here is that if the clustered primary key is being used in a query, say a six table join, every clustered index seek or scan will have returned the entire row regardless of how many columns were required to satisfy the query, this can be a big hit in terms of io and usage of the buffer cache.

 

A clustered index seek in a query plan is good:

  • Well actually "it just depends".
  • Depending upon the where clause a seek may actually be a scan.
  • I returned 1,999,999 integer values from the clustered key from a 2 million row table, this is shown quite clearly as a seek.
  • o ( using a where clause of < 2000000 )
  • I then returned all 2 million rows
  • o ( using a where clause of <= 2000000 )
  • The point here was that when looking at query plans don't just assume a seek is automatically good, you must also consider how many rows the seek is returning.

 

What should the Primary/Clustered key be:

  • I said that I was leaning towards the use of a sequential numeric key, a surrogate key if needed.
  • I tended to only see the clustered key as a means to defragging the table and of having limited use in data retrieval queries.
  • o Well how many single column( key) queries do you do on a clustered PK?
  • o How often do you actually want the entire row of every table in say a 5 table join.
  • Did I think the Primary key should be part of the real data?
  • Yes this is valid, but not always the clustered key.

 

The curse of the null:

  • This one is easy.
  • I ran a query in the background against a 400 million row table.
  • A nullable column makes every row in an index 3 bytes bigger than a non null column.
  • o This is 3 bytes per column, index 4 nullable columns you get 12 bytes per row extra
  • The 400 million table results showed that a single integer index was 1.14GB larger for the nullable column.
  • That's a big difference in storage, io, buffer usage, backups
  • With an integer clustered key the record size per row for an integer secondary index are 12 bytes for a nullable column and 9 bytes for a non null.
  • o In page allocation this is 896 rows per page vs 671 rows per page
  • Finally a table which has all nullable columns accepts an entry of nil data ( except the PK ) not a particularly good design concept.

 

Talking of nulls:

  • How do you deal with multiple nulls where the data must be unique?
  • For UK readers we have a National Insurance number allocated to us.
  • This number is unique and is a key part of data collected by employers, so a HR database which stores NI numbers must make sure the stored numbers ( which actually are alphanumeric ) are always unique.
  • But life isn't always like that and new employees might not immediately know their NI number so you store "Unknown" or null. Trouble is that you may only have one null or unknown in your column.
  • Prior to SQL 2008 you could solve this by using an indexed view defined on the NI number.

 

 

create view dbo.vw_People_NiNumber

with schemabinding

as

select NiNumber  from dbo.People where NINumber is not null;

--

-- now index the view

--

create unique clustered index idx_vw_People_NiNumber on dbo.vw_People_NiNumber(NiNumber);

--

-- show it has materialised

--

exec dbo.sp_spaceused  'dbo.vw_People_NiNumber';

 

 

  • This will allow you to have multiple nulls but the stored data will be unique
  • SQL 2008 provides filtered indexes to resolve this, below.

 

-- now in sql 2008

--

create unique index uk_People_NiNumber on dbo.People(NiNumber) where NiNumber is not null;

 

 

How do you create an index that spans more than one table?

  • Now there's a handy thought if I ever heard one, however, in SQL Server an index can only include columns from a single table.
  • Well that's actually not strictly true, you can do this with an indexed view.

Indexed Views to replace joins:

  • Indexed views can be useful, typically they may be used to store aggregated data.
  • As a tuning aid I will consider using an indexed view to replace tables in a join.
  • To explain, the view is used ONLY to satisfy part or all of a joined query, I don't call the view directly or wish to do so.
  • The optimiser is clever enough to recognise if an indexed view "covers" part of a query and will use it replacing the original tables.
  • This is a three table join

 

-- query to show additional data for People

--

select p.*,sp.name,cr.name

from dbo.People p join dbo.StateProvice sp on p.StateProvinceID = sp.StateProvinceID

join dbo.CountryRegion cr on cr.CountryRegionCode = sp.CountryRegionCode;

--

 

 

  • The view replaces two of the tables and the join ( StateProvince and CountryRegion )
  • When the query runs again the optimiser will use the indexed view instead of the two tables
  • ( this is better viewed on http://www.grumpyolddba.co.uk/ )

 

 

-- create view

--

create view dbo.vw_RegionStuff

with schemabinding

as

select sp.StateProvinceID, cr.CountryRegionCode,    sp.name as Province ,cr.name as Region

from dbo.StateProvice sp

join dbo.CountryRegion cr on cr.CountryRegionCode = sp.CountryRegionCode;

go

--

-- create index

--

create  unique clustered index  cix_RegionStuff on vw_RegionStuff(StateProvinceID,CountryRegionCode,Province,Region);

go

 

 

 

Published Thursday, February 5, 2009 8:24 PM by GrumpyOldDBA
Filed under: , ,

Comments

# re: Can I Index a bit ? ( Part 1 )

Thursday, February 5, 2009 8:28 PM by GrumpyOldDBA

I've never encountered such an impossible way to present a document! Wait for the website version