GUID Fragmentation in SQL Server

Published 25 May 09 12:47 PM | MartinBell

I previously looked at the different methods you can use to create guids and how UuidCreateSequential does not order the guids in the same way as SQL Server. When NEWSEQUENTIALID() function was released with SQL Server 2005 it touted as the solution to problem of page splits when you use random guids. I therefore want to show how the different methods of generating guids affected fragmentation.

To do this I am re-using the test harness I created in the GUID Ordering article. You can use the view sys.dm_db_index_physical_stats to see the fragmentation of the clustered index (primary key) on GuidTable. I create a table IndexStats so that I would save the index information.

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[IndexStats]') AND type in (N'U'))
      DROP TABLE [dbo].[IndexStats]
GO

CREATE TABLE [dbo].[IndexStats](
      [IndexColumn] sysname NOT NULL,
      [database_id] [smallint] NULL,
      [object_id] [int] NULL,
      [index_id] [int] NULL,
      [partition_number] [int] NULL,
      [index_type_desc] [nvarchar](60) NULL,
      [alloc_unit_type_desc] [nvarchar](60) NULL,
      [index_depth] [tinyint] NULL,
      [index_level] [tinyint] NULL,
      [avg_fragmentation_in_percent] [float] NULL,
      [fragment_count] [bigint] NULL,
      [avg_fragment_size_in_pages] [float] NULL,
      [page_count] [bigint] NULL,
      [avg_page_space_used_in_percent] [float] NULL,
      [record_count] [bigint] NULL,
      [ghost_record_count] [bigint] NULL,
      [version_ghost_record_count] [bigint] NULL,
      [min_record_size_in_bytes] [int] NULL,
      [max_record_size_in_bytes] [int] NULL,
      [avg_record_size_in_bytes] [float] NULL,
      [forwarded_record_count] [bigint] NULL,
      [compressed_page_count] [bigint] NULL
)
GO

Originally the GuidTable from my GUID ordering article had the identity value as the primary key, therefore I would not expect any fragmentation from to occur on this index. So after the table was populated I ran the SQL statement:

INSERT INTO [dbo].[IndexStats]
           ([IndexColumn]
           ,[database_id]
           ,[object_id]
           ,[index_id]
           ,[partition_number]
           ,[index_type_desc]
           ,[alloc_unit_type_desc]
           ,[index_depth]
           ,[index_level]
           ,[avg_fragmentation_in_percent]
           ,[fragment_count]
           ,[avg_fragment_size_in_pages]
           ,[page_count]
           ,[avg_page_space_used_in_percent]
           ,[record_count]
           ,[ghost_record_count]
           ,[version_ghost_record_count]
           ,[min_record_size_in_bytes]
           ,[max_record_size_in_bytes]
           ,[avg_record_size_in_bytes]
           ,[forwarded_record_count]
           ,[compressed_page_count])
SELECT 'id',* FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('GuidTable'), 1, DEFAULT, DEFAULT ) ;
GO

By truncating the table changing the primary key for each of the columns, then re-running the test harness I could see the effect of having a clustered index on random and sequential guids is e.g.

TRUNCATE TABLE GuidTable;
GO
ALTER TABLE GuidTable DROP CONSTRAINT PK_GuidTable ;
GO
ALTER TABLE GuidTable ADD CONSTRAINT PK_GuidTable PRIMARY KEY (newidguid) ;
GO

/* Run the test harness */

The complete script is here.

 

After running the test harness 6 times. I ran the following query to obtain results for leaf level fragmention:


SELECT
[IndexColumn]
           ,[avg_fragmentation_in_percent]
           ,[fragment_count]
           ,[avg_fragment_size_in_pages]
           ,[page_count]
           ,[avg_page_space_used_in_percent]
           ,[record_count]
FROM [dbo].[IndexStats]
WHERE index_level = 0

GO

Index Column Average Fragmentation in Percent Fragment Count Average Fragment Size in Pages Page Count Average Page Space Used in Percent Record Count
id 4.35 7 16.43 115 99.89 10000
newidguid 98.77 162 1 162 70.90 10000
newsequentualidguid 4.35 7 16.43 115 99.89 10000
dotnetguid 98.78 164 1 164 70.04 10000
dotnetsequentialguid 98.73 157 1 157 73.16 10000
dotnetmodifiedsequentialguid 4.35 7 16.43 115 99.89 10000

This shows the amount of fragmentation for random guids is very significant (It is recomended that "Fragmentation in Percent" should be as near to zero as possible). The number of pages used by random guid is 40% higher and the amount of space used on each page is less, therefore the disc space required will increase. With a low "Average Fragment Size in Pages", random guids will require more disc I/O to return the same number of pages, decreasing range scan performance.

Even using guids created with UuidCreateSequential you will not significantly reduce the problem.

Comments

# GrumpyOldDBA said on May 26, 2009 10:52 AM:

I use guids as keys in my tests ( see  my posts about san tests ) as they are probably the worst data type you could ever use for a key and in my indexing presentations I suggest you should never use a guid ( of any type ) as a key ( too wide - too random ), but they're good for tests!

# Dew Drop - May 26, 2009 | Alvin Ashcraft's Morning Dew said on May 26, 2009 01:06 PM:

Pingback from  Dew Drop - May 26, 2009 | Alvin Ashcraft's Morning Dew

# GUID vs Auto Increment INT as Databse Primary Keys : : Veeb's Brain Dump said on June 13, 2010 06:39 AM:

Pingback from  GUID vs Auto Increment INT as Databse Primary Keys :  : Veeb's Brain Dump

This Blog

SQL Blogs

Syndication