GUID Fragmentation in SQL Server
I previously looked at the different methods you can use to create guids and how UuidCreateSequential does not order the guids in the same way as SQL Server. When NEWSEQUENTIALID() function was released with SQL Server 2005 it touted as the solution to problem of page splits when you use random guids. I therefore want to show how the different methods of generating guids affected fragmentation.
To do this I am re-using the test harness I created in the GUID Ordering article. You can use the view sys.dm_db_index_physical_stats to see the fragmentation of the clustered index (primary key) on GuidTable. I create a table IndexStats so that I would save the index information.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[IndexStats]') AND type in (N'U'))
DROP TABLE [dbo].[IndexStats]
GO
CREATE TABLE [dbo].[IndexStats](
[IndexColumn] sysname NOT NULL,
[database_id] [smallint] NULL,
[object_id] [int] NULL,
[index_id] [int] NULL,
[partition_number] [int] NULL,
[index_type_desc] [nvarchar](60) NULL,
[alloc_unit_type_desc] [nvarchar](60) NULL,
[index_depth] [tinyint] NULL,
[index_level] [tinyint] NULL,
[avg_fragmentation_in_percent] [float] NULL,
[fragment_count] [bigint] NULL,
[avg_fragment_size_in_pages] [float] NULL,
[page_count] [bigint] NULL,
[avg_page_space_used_in_percent] [float] NULL,
[record_count] [bigint] NULL,
[ghost_record_count] [bigint] NULL,
[version_ghost_record_count] [bigint] NULL,
[min_record_size_in_bytes] [int] NULL,
[max_record_size_in_bytes] [int] NULL,
[avg_record_size_in_bytes] [float] NULL,
[forwarded_record_count] [bigint] NULL,
[compressed_page_count] [bigint] NULL
)
GO
Originally the GuidTable from my GUID ordering article had the identity value as the primary key, therefore I would not expect any fragmentation from to occur on this index. So after the table was populated I ran the SQL statement:
INSERT INTO [dbo].[IndexStats]
([IndexColumn]
,[database_id]
,[object_id]
,[index_id]
,[partition_number]
,[index_type_desc]
,[alloc_unit_type_desc]
,[index_depth]
,[index_level]
,[avg_fragmentation_in_percent]
,[fragment_count]
,[avg_fragment_size_in_pages]
,[page_count]
,[avg_page_space_used_in_percent]
,[record_count]
,[ghost_record_count]
,[version_ghost_record_count]
,[min_record_size_in_bytes]
,[max_record_size_in_bytes]
,[avg_record_size_in_bytes]
,[forwarded_record_count]
,[compressed_page_count])
SELECT 'id',* FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('GuidTable'), 1, DEFAULT, DEFAULT ) ;
GO
By truncating the table changing the primary key for each of the columns, then re-running the test harness I could see the effect of having a clustered index on random and sequential guids is e.g.
TRUNCATE TABLE GuidTable;
GO
ALTER TABLE GuidTable DROP CONSTRAINT PK_GuidTable ;
GO
ALTER TABLE GuidTable ADD CONSTRAINT PK_GuidTable PRIMARY KEY (newidguid) ;
GO
/* Run the test harness */
The complete script is here.
After running the test harness 6 times. I ran the following query to obtain results for leaf level fragmention:
SELECT [IndexColumn]
,[avg_fragmentation_in_percent]
,[fragment_count]
,[avg_fragment_size_in_pages]
,[page_count]
,[avg_page_space_used_in_percent]
,[record_count]
FROM [dbo].[IndexStats]
WHERE index_level = 0
GO
| Index Column |
Average Fragmentation in Percent |
Fragment Count |
Average Fragment Size in Pages |
Page Count |
Average Page Space Used in Percent |
Record Count |
| id |
4.35 |
7 |
16.43 |
115 |
99.89 |
10000 |
| newidguid |
98.77 |
162 |
1 |
162 |
70.90 |
10000 |
| newsequentualidguid |
4.35 |
7 |
16.43 |
115 |
99.89 |
10000 |
| dotnetguid |
98.78 |
164 |
1 |
164 |
70.04 |
10000 |
| dotnetsequentialguid |
98.73 |
157 |
1 |
157 |
73.16 |
10000 |
| dotnetmodifiedsequentialguid |
4.35 |
7 |
16.43 |
115 |
99.89 |
10000 |
This shows the amount of fragmentation for random guids is very significant (It is recomended that "Fragmentation in Percent" should be as near to zero as possible). The number of pages used by random guid is 40% higher and the amount of space used on each page is less, therefore the disc space required will increase. With a low "Average Fragment Size in Pages", random guids will require more disc I/O to return the same number of pages, decreasing range scan performance.
Even using guids created with UuidCreateSequential you will not significantly reduce the problem.