Heap insert behaviour

Published 10 February 11 07:28 PM | MartinBell

As part of his preparation for the “A trip into Index Internals” talk that David Betteridge is giving at the Leeds and Manchester User Groups next month he came upon the following behaviour.

Using the following table:

CREATE TABLE dbo.Heap
(
      KeyField          CHAR(100)         NOT NULL,
      Filler            CHAR(1900)        NOT NULL
)
GO

If you insert 4 rows in a single statement (full script here) e.g.

INSERT INTO dbo.Heap (KeyField, Filler)
VALUES ('1', REPLICATE('A',1900)), ('2', REPLICATE('B',1900)),
('3', REPLICATE('C',1900)), ('3', REPLICATE('D',1900)) ;
GO

and then look at the data pages allocated using DBCC IND

DECLARE @db_id INT = DB_ID() ;
DBCC IND (@db_id, 'dbo.Heap', -1)
GO

You will see that they are all contained on a single data page.


If you look at the DBCC PAGE using the following SQL:

DBCC TRACEON(3604)
DECLARE @db_id INT = DB_ID() ;
DBCC PAGE ( @db_id, 1, 78, 1 )
GO

You can see from the DBCC PAGE output that the PFS indicates that the page is 100_PCT_FULL. The page header for page 78 gives 4 slots in use,
freeCnt indicates 60 bytes are not being used and record size is 2007.




The output from DBCC PAGE can be found here.

But if you insert 4 rows individually (full script here)

DECLARE @i int = 1
WHILE @i < 5
BEGIN
      INSERT INTO dbo.Heap (KeyField, Filler) 
      VALUES (CAST(@i as CHAR(100)), REPLICATE(CHAR(64+@i),1900)) ;
      SET @i+=1 ;
END
GO

Then the forth row inserted will be on a second data page

The DBCC PAGE output for page 78 is now as follows:



You can see from this the PFS indicates that the page is 80_PCT_FULL, and the page header shows 3 slots are beng used and 2069 bytes are free. The DBCC PAGE for page 80 is as follows:



This indicates that 1 slot is in use. The output from DBCC PAGE can be found here.

Looking further into this I found that inserting rows in pairs would use a single page, but inserting three rows and then a single row you get two data pages. Searching for details about this, I found no information that clearly explains why this occurs. In the subject Managing Extent Allocations and Free Space in Books Online it talks about how the PFS pages track the free space:

"The PFS has one byte for each page, recording whether the page is allocated, and if so, whether it is empty, 1 to 50 percent full, 51 to 80 percent full, 81 to 95 percent full, or 96 to 100 percent full.

After an extent has been allocated to an object, the Database Engine uses the PFS pages to record which pages in the extent are allocated or free. This information is used when the Database Engine has to allocate a new page.  The amount of free space in a page is only maintained for heap and ext/Image pages. It is used when the Database Engine has to find a page with free space available to hold a newly inserted row. Indexes do not require that the page free space be tracked, because the point at which to insert a new row is set by the index key values."

Paul Randal has more information about the structure of the PFS on his blog post

In the Book Online topic “Estimating the Size of a Heap”  you can estimate how many rows will be used. In this case:

Fixed_Data_Size = 2000

Variable_Data_Size = 0

Null_Bitmap = 2 + ((Num_Cols + 7) / 8) = 1

Row_Size = Fixed_Data_Size + Variable_Data_Size + Null_Bitmap + 4 = 2007

Rows_Per_Page = 8096 / (Row_Size + 2) = 4

This estimates four rows per page and yet in the second example we only have 3 rows?

The infomation in Managing Extent Allocations and Free Space says the Database engine uses the percentage free information from the PFS page to find out if page has enough space. The value used is the upper bound for the range indicated by the bit setting, even if the page itself has more bytes free. This means that for an 8K page that is 80% full, 1638 bytes is the value used for the amount of bytes free, a page that is 95% full then 409 bytes are deemed to be free. So, for the second example where the row is 2007 bytes when the page that is 51-80% full (or more), SQL Server decides there is not enough room available and therefore a new page is allocated.

The Rows_Per_Page equation also has some limitations. It is based on the FixedVar row format and whilst it does account for the following:

  • Page header (96 bytes i.e. 8096 = 8192 – 96 ) 
  • Row Offset Array  (2 bytes)


It doesn’t take into account the following:

  • Version information if snapshot isolation is turned on on for the database (14 bytes)
  • Reserved spaces on the page that is used if running transactions where to roll back and need to put the original row back on the page
  • Vardecimal columns 
  • Sparse columns
  • Data Compression and rows in Column Descriptor format

To remove the unused extra space at the end of the page, you can use the ALTER TABLE… REBUILD statement available in SQL Server 2008.

ALTER TABLE dbo.Heap REBUILD ;
GO

When I do this on the above table I get the following information returned from DBCC IND.



This is still showing 2 data pages for the heap, but DBCC PAGE shows that page 89 is not being used. This bug is fixed in Denali.
For as script including the rebuild see here.

To get the best space usage for a heap you can do the following:

  1. insert rows in single statement
  2. use bulk insert
  3. avoid lots of updates / deletes
  4. maintain the heap with ALTER TABLE.. REBUILD

Many thanks to Marcel van der Holst who made this post possible.

Filed under:

Comments

No Comments

This Blog

SQL Blogs

Syndication