Does a GUID make a good clustered key?

I've covered the use of unique identifiers ( guids ) in my presentations and in my articles on anlysing indexes; however I probably wasn't quite as succinct as I could have been.

So do I like unique identifiers as clustered keys? Absolutely not unless there is no other choice.

Why? Well other than all guid implemetations I have seen do not use sequential guids it's a top step to making sure your your base table always fragments badly.
But I really dislike them when we get to consider large tables, where we have millions if not hundreds of millions of rows.
A guid is 16 bytes wide and the width of a clustered index is added to every row of a secondary index.
I cover the subject in depth in part 4 ( sorry it takes a while to load)

Anyway - Consider a worst case scenario where we have an secondary index index on a nullable tinyint where the clustered key/index is a guid.
How big is your index per row?
Well a  tinyint is 1 byte
Each row of the index will actually be 16 bytes + 1 byte overhead + 1 byte for the tinyint + 3 bytes for being a nullable column
Actual size 21 bytes per row for a 1 byte data type.
Using an int for the clustered key will remove 12 bytes per row for the index
removing the null will remove a further 3 bytes.
But just using an int in place of a guid will save you 115 MB on every secondary index on a 10 million row table.
On a billion row table this equates to 11 GB and if you factor in the null this would save in total 14 GB per index, as most tables I  encounter have many indexes you could be really making a real meal of it.

Now you may say who has tables that big, well a few years ago I was facing 400 million row tables and my current project is likely to hit a billion rows or more in some tables. I don't have any guids as you might expect.

Now the other downside of guids is that your foreign keys also become guids so a simple index on a key + one FK will come in at 33 bytes per row. Now indexes take up buffer cache and as databases beome larger then storing bloated indexes in cache is best avoided. ( You could of course use compressed keys to save space, and I have a large post coming on that as part of the "Can I index a bit" )

I'm having to exist on mobile internet as my broadband is being switched so it's made posting slightly more tricky

Published Wednesday, March 11, 2009 10:14 PM by GrumpyOldDBA


# Links for the Week of 2009-03-13 | facility9

Friday, March 13, 2009 5:17 PM by Links for the Week of 2009-03-13 | facility9

Pingback from  Links for the Week of 2009-03-13 | facility9