February 2008 - Posts

Row compression – internal structure

The CTP6 release of SQL Server 2008 includes row and page compression. It’s a feature that will only be in the Developer and Enterprise edition of SQL Server, so I wasn’t sure whether I should (or could!) add it in to SQL Internals Viewer. I had a look into it and thought it would be worth putting it in as it’s one of those things where it’ll be very useful to understand how it works.

So far I’ve only got row compression covered in SQL Internals Viewer, but I thought I’d better get down what I’ve found out so far.

There’s more on the new SQL Server 2008 compression on the SQL Server Storage Engine Blog here.

Row compression has a completely different row format. The size of a field is determined by the minimum amount of space needed to store it.  The Books Online topic ‘Row Compression Implementation’ has a good run down of how space is saved with different data types.

It’s very easy to add row compression to a table. The syntax is:

ALTER TABLE table REBUILD WITH (DATA_COMPRESSION = ROW)

The standard row structure is covered in Kalen Delaney’s Inside Microsoft SQL Server 2005: The Storage Engine, and there’s also an overview here (Storage Engine Blog).

Compressed row structure

(There may be mistakes in this as it’s currently undocumented, please let me know if anything needs correcting)

Compressed row example

Status Bits A
1st byte

This looks the same as a normal row, although there may be differences.

Number of columns 1 or 2 byte integer
2nd/2nd-3rd byte

This is the first instance where space can be saved. If the number of columns can fit into one byte (0-254) one byte will be used, if not two bytes are used. If the first (high-order) bit is 1 on the first byte this indicates a second byte is used.

CD Array
Next (Number of columns/2) + (Number of columns%2) bytes

I’m guessing CD stands for Column Description or Compression Description. It’s an array of 4-bit (nibble) integers, stored 2 per byte.  Every column in the row has a CD Array entry that determines if it is null, empty, stored ‘short’ (and if so the size) or stored ‘long’.
Short and long are the equivalents to the difference between fixed and variable length storage in the standard row format. Short CD Array entries represent fixed length storage (defined by the CD Array entry), but they use the optimal amount or storage. Long fields are similar to variable length fields, they have an entry in a row offset array, and these too use optimized storage.

Possible values for the CD array are:

0 – Null
1 – Empty
2 – 1 byte short
3 – 2 bytes short
4 – 3 bytes short
5 – 4 bytes short
6 – 5 bytes short
7 – 6 bytes short
8 – 7 bytes short
9 – 8 bytes short
10 - Long

If a field is a BIT data type the value of the CD Array is used as the value.

Row compression essentially turns every compressible field into a variable length field. It seems that the distinction between long and short columns is used so the extra overhead (column offset array entry) is only used when necessary. Below 9 bytes the CD array can be used to store the length. Above 8 bytes and an extra two bytes are used for the offset array entry.

Short Column Data
Next ∑ (short bytes in CD Array)

Unknown
?

Number of variable length columns
Next 2 bytes

Column offset array
Next (2 * Number of variable length columns) bytes

Each 2-byte integer defines the end offset of the variable length field

Long Column Data
The long/variable length fields with the offsets defined in the offset array.

Here’s an example in Internals Viewer:

No compression:

Row with no compression

With compression:

Row with compression

This only covers data records. I've still got to look into indexes and after that page compression (which when used also uses row compression). I'll also try to blog on how the data is actually stored and how to decode it.

Hopefully everything will be covered in version 1.0 of SQL Internals Viewer.

Server Alert - Trial version available

There is now a trial version of Server Alert available from http://www.internalexternal.com/ServerAlertTrial.aspx

New Product: Server Alert

I’m pleased to announce a new application called Server Alert.

The application is a small add-in for SQL Server Management Studio that shows a coloured bar at the side of all query windows. The coloured bar indicates which server the window is connected to. Different servers can be assigned different colours.

I’ve created this to make the current connection is a lot clearer. Although the server name is on the status bar at the bottom of a query it can be all too easy to execute a query on the wrong server, especially if multiple queries are open on different connections. Server Alert makes it a lot more apparent what the current connection is to avoid the heart-stopping “was that the right server?” moments!

There is a small demo of it in action at the new website: www.internalexternal.com\serveralert.aspx.

It’s available through www.internalexternal.com for $16.

For example you can colour code green for test or dev environments...

Server Alert connected to Production server

 ...and red for production environments

Server Alert connected to a test database