SQL Internals Viewer - New version with sparse column support

I've just released a new version of SQL Internals Viewer that has support for 2008 sparse columns, a feature introduced in SQL Server 2008 CTP6.

There are also a few bug fixes and minor changes.

It's available to download from http://www.sqlinternalsviewer.com/

Thanks to Kalen for the help with the sparse vector complex header info.

SQL Internals Viewer 1.0 Released

I’m pleased to announce that SQL Internals Viewer 1.0 has been released. It can be downloaded from www.sqlinternalsviewer.com.

I've also put up the first part of a user guide that covers the main window and Allocation Map. The second part will follow shortly which will cover the Page Viewer. The user guide is available here.

If you've got an existing version installed it will need to be removed through Add/Remove Programs or Programs and Features in Control Panel.

New features

Encode and Find is a new feature in the Page Viewer that allows you to encode a value to a particular data type and then search for it in the page. It can be accessed using the Page – Encode and Find menu item or the button on the toolbar.

Encode and Find

There’s also a new feature on the hex viewer so that once you’ve found the data you are looking for you can select the record that it is contained in. This can be done by right clicking on the byte and selecting Select Record.

SQL Server 2008 Page and Row Compression

The Page Viewer can display the new SQL Server 2008 Page and Row compression row structures, including the CI (Compression Information) structure.

2008 Compression

At the moment the application only supports data pages.

Key

There is a new improved version of the Key for the allocation map.

Key

Clicking on an item on the Key will highlight it on the Allocation Map and fade the other items. Clicking on it again will clear the select.

Improvements

There have been several bug fixes and performance fixes, including improvements to the load times for databases.

Clicking on the Allocation Map will open the page in the current Page Viewer. To open the page in a new Page Viewer hold down the shift button and click on the page. There are more details of the new changes in the User Guide.

SQL Server 2008 Support

There is still work to be done to get Internals Viewer working will all of the latest features of SQL Server 2008, including Page and Row compression on indexes and sparse columns. It’s something I’m working on at the moment and I hope to release in the next few months.

Row compression – internal structure

The CTP6 release of SQL Server 2008 includes row and page compression. It’s a feature that will only be in the Developer and Enterprise edition of SQL Server, so I wasn’t sure whether I should (or could!) add it in to SQL Internals Viewer. I had a look into it and thought it would be worth putting it in as it’s one of those things where it’ll be very useful to understand how it works.

So far I’ve only got row compression covered in SQL Internals Viewer, but I thought I’d better get down what I’ve found out so far.

There’s more on the new SQL Server 2008 compression on the SQL Server Storage Engine Blog here.

Row compression has a completely different row format. The size of a field is determined by the minimum amount of space needed to store it.  The Books Online topic ‘Row Compression Implementation’ has a good run down of how space is saved with different data types.

It’s very easy to add row compression to a table. The syntax is:

ALTER TABLE table REBUILD WITH (DATA_COMPRESSION = ROW)

The standard row structure is covered in Kalen Delaney’s Inside Microsoft SQL Server 2005: The Storage Engine, and there’s also an overview here (Storage Engine Blog).

Compressed row structure

(There may be mistakes in this as it’s currently undocumented, please let me know if anything needs correcting)

Compressed row example

Status Bits A
1st byte

This looks the same as a normal row, although there may be differences.

Number of columns 1 or 2 byte integer
2nd/2nd-3rd byte

This is the first instance where space can be saved. If the number of columns can fit into one byte (0-254) one byte will be used, if not two bytes are used. If the first (high-order) bit is 1 on the first byte this indicates a second byte is used.

CD Array
Next (Number of columns/2) + (Number of columns%2) bytes

I’m guessing CD stands for Column Description or Compression Description. It’s an array of 4-bit (nibble) integers, stored 2 per byte.  Every column in the row has a CD Array entry that determines if it is null, empty, stored ‘short’ (and if so the size) or stored ‘long’.
Short and long are the equivalents to the difference between fixed and variable length storage in the standard row format. Short CD Array entries represent fixed length storage (defined by the CD Array entry), but they use the optimal amount or storage. Long fields are similar to variable length fields, they have an entry in a row offset array, and these too use optimized storage.

Possible values for the CD array are:

0 – Null
1 – Empty
2 – 1 byte short
3 – 2 bytes short
4 – 3 bytes short
5 – 4 bytes short
6 – 5 bytes short
7 – 6 bytes short
8 – 7 bytes short
9 – 8 bytes short
10 - Long

If a field is a BIT data type the value of the CD Array is used as the value.

Row compression essentially turns every compressible field into a variable length field. It seems that the distinction between long and short columns is used so the extra overhead (column offset array entry) is only used when necessary. Below 9 bytes the CD array can be used to store the length. Above 8 bytes and an extra two bytes are used for the offset array entry.

Short Column Data
Next ∑ (short bytes in CD Array)

Unknown
?

Number of variable length columns
Next 2 bytes

Column offset array
Next (2 * Number of variable length columns) bytes

Each 2-byte integer defines the end offset of the variable length field

Long Column Data
The long/variable length fields with the offsets defined in the offset array.

Here’s an example in Internals Viewer:

No compression:

Row with no compression

With compression:

Row with compression

This only covers data records. I've still got to look into indexes and after that page compression (which when used also uses row compression). I'll also try to blog on how the data is actually stored and how to decode it.

Hopefully everything will be covered in version 1.0 of SQL Internals Viewer.

Server Alert - Trial version available

There is now a trial version of Server Alert available from http://www.internalexternal.com/ServerAlertTrial.aspx

New Product: Server Alert

I’m pleased to announce a new application called Server Alert.

The application is a small add-in for SQL Server Management Studio that shows a coloured bar at the side of all query windows. The coloured bar indicates which server the window is connected to. Different servers can be assigned different colours.

I’ve created this to make the current connection is a lot clearer. Although the server name is on the status bar at the bottom of a query it can be all too easy to execute a query on the wrong server, especially if multiple queries are open on different connections. Server Alert makes it a lot more apparent what the current connection is to avoid the heart-stopping “was that the right server?” moments!

There is a small demo of it in action at the new website: www.internalexternal.com\serveralert.aspx.

It’s available through www.internalexternal.com for $16.

For example you can colour code green for test or dev environments...

Server Alert connected to Production server

 ...and red for production environments

Server Alert connected to a test database

Stored Procedure parameters

Here’s some more SQL that writes SQL. One way of debugging a stored procedure is to chop off the CREATE PROCEDURE at the top and replace it with DECLARE and SET statements for the variables, then step through the stored procedure.

The following SQL gives an easy way of extracting the stored procedure parameters and creating variables based on the parameters, including the data type.  Just copy-paste the output into a query window.

The variable initialization is output as template parameters so you can press Ctrl+Shift+M and easily populate the variables using the Template Parameter window.

The first version is a simple query, the second is a UDF that you can keep in the master database.

Query version:

DECLARE @StoredProcName VARCHAR(100)

 

SET @StoredProcName = '(Stored Proc Name)'

 

SELECT 'DECLARE ' + c.name +

       ' ' + t.name +

       CASE WHEN t.name LIKE '%char%'

            THEN '(' + CONVERT(varchar, c.max_length) + ')'

            ELSE '' END

FROM   sys.parameters c

       INNER JOIN sys.types t ON c.user_type_id = t.user_type_id

WHERE  c.object_id = OBJECT_ID(@StoredProcName)

UNION ALL

SELECT ' '

UNION ALL

SELECT 'SET ' + c.name +

       ' = <' + c.name +

       ',' + t.name +

       CASE WHEN t.name LIKE '%char%'

            THEN '(' + CONVERT(varchar, c.max_length) + ')'

            ELSE '' END +

       ',>'

FROM   sys.parameters c

       INNER JOIN sys.types t ON c.user_type_id = t.user_type_id

WHERE  c.object_id = OBJECT_ID(@StoredProcName)

Example

Output from SET @StoredProcName = 'HumanResources.uspUpdateEmployeePersonalInfo' in the AdventureWorks db:

DECLARE @EmployeeID int

DECLARE @NationalIDNumber nvarchar(30)

DECLARE @BirthDate datetime

DECLARE @MaritalStatus nchar(2)

DECLARE @Gender nchar(2)

 

SET @EmployeeID = <@EmployeeID,int,>

SET @NationalIDNumber = <@NationalIDNumber,nvarchar(30),>

SET @BirthDate = <@BirthDate,datetime,>

SET @MaritalStatus = <@MaritalStatus,nchar(2),>

SET @Gender = <@Gender,nchar(2),>

User-defined function version:

CREATE FUNCTION dbo.uFn_StoredProcVariables(@StoredProcName SYSNAME)

                                              RETURNS NVARCHAR(MAX) AS

BEGIN

    DECLARE @Declares NVARCHAR(MAX)

    DECLARE @Sets     NVARCHAR(MAX)

 

    SET @Declares =

        (SELECT 'DECLARE ' + c.name +

                ' ' + t.name +

                CASE WHEN t.name LIKE '%char%'

                     THEN '('+CONVERT(varchar, c.max_length)+')'

                     ELSE '' END + CHAR(10)

         FROM   sys.parameters c

                INNER JOIN sys.types t

                             ON c.user_type_id = t.user_type_id

         WHERE  c.object_id = OBJECT_ID(@StoredProcName)

         FOR XML PATH(''))

 

    SET @Sets =

        (SELECT 'SET ' + c.name +

                ' = <' + c.name +

                ',' + t.name +

                CASE WHEN t.name LIKE '%char%'

                     THEN '(' + CONVERT(varchar, c.max_length) + ')'

                     ELSE '' END + ',>' + CHAR(10)

         FROM   sys.parameters c

                INNER JOIN sys.types t

                             ON c.user_type_id = t.user_type_id

         WHERE  c.object_id = OBJECT_ID(@StoredProcName)

         FOR XML PATH(''))

 

    RETURN @Declares

           + CHAR(10)

           + REPLACE(REPLACE(@Sets, '&gt;', '>'), '&lt;', '<')

END

Example

PRINT dbo.uFn_StoredProcVariables('HumanResources.uspUpdateEmployeePersonalInfo')

Scuffling with ‘String or binary data would be truncated’

The error ‘String or binary data would be truncated’ can be annoying.  It occurs when you try to insert or update a string or binary column with a value that is too large. Recently I was trying to INSERT from a SELECT from one table to another and I got this error. It can be a pain tracking down the cause, especially if there are a large number of columns or a large dataset involved.

In the past I’ve written queries to give me the LEN for each column, but again if there are a large number of columns involved this can be very time consuming.

Below is a way of identifying which rows are causing the problem. This doesn’t help if you’ve got a large number of columns, as you still need to work out which field is causing the problem, but it will help if you have a large dataset and the problem rows are very sparse.

For this example I’ll create a couple of tables and generate some data. The source table has a column of VARCHAR(50), whereas the destination has VARCHAR(25):

CREATE TABLE SourceTable

    (

    RowId  INT

   ,Chars  INT

   ,String VARCHAR(50)

    )

GO

 

CREATE TABLE DestinationTable

    (

    RowId  INT

   ,Chars  INT

   ,String VARCHAR(25)

    )

GO

Next the tables are populated with a random number of ‘X’s, between 0 and 50. In theory you should get about 50% with a length above 25 characters and 50% below.

DECLARE @i INT

DECLARE @RandomNumber INT

 

SET @i=0

WHILE @i <= 50

BEGIN

    SET @RandomNumber = ROUND(50 * RAND(), 0)

 

    INSERT INTO SourceTable

    SELECT @i, @RandomNumber, REPLICATE('X', @RandomNumber)

 

    SET @i=@i+1

END

GO

Next try inserting from SourceTable to DestinationTable:

INSERT INTO DestinationTable

SELECT * FROM SourceTable
GO

This results in the error:

Msg 8152, Level 16, State 14, Line 1

String or binary data would be truncated.

The statement has been terminated.

It’s possible to ignore the 'String or binary data would be truncated' message by setting ANSI_WARNINGS to OFF. This will truncate fields where they don’t fit. ANSI_WARNINGS OFF has drawbacks and it is better to correct a problem rather than ignore it.

The following can be used to work out which rows are causing the issue:

1. Take a copy of the destination table:

SELECT * INTO #Destination FROM DestinationTable WHERE 1=2

GO

2. Set ANSI_WARNINGS OFF and perform the insert into the copy of the destination table, then set ANSI_WARNINGS ON again:

SET ANSI_WARNINGS OFF

GO

 

INSERT INTO #Destination

SELECT * FROM SourceTable

GO

SET ANSI_WARNINGS ON

GO

As ANSI_WARNINGS is off SQL Server truncates the fields rather than produces the warning.

3. Next compare what you would like to insert against what was inserted with the ANSI_WARNINGS OFF truncating. By using EXCEPT you only select the rows that don't match, and have therefore been truncated:

SELECT * FROM SourceTable

EXCEPT

SELECT * FROM #Destination

GO

The rows that have been truncated and are the cause of the ‘String or binary data would be truncated’ error.

(Note - The use of EXCEPT limits this to 2005/2008. The finaly query could be re-written for SQL Server 2000 and below.)

This isn’t the most elegant solution, and as I said if there were a large number of columns you’d still need to hunt through for the offender(s), but at least this gives an idea of where to look. I may have missed some glaringly obvious solution to this problem, so I’d be interested to know if anyone has any other ways of dealing it.

 

 

 

SQL Internals Viewer on LearnSQLServer.com

Scott Whigham at LearnSQLServer.com has featured SQL Internals Viewer in a new series of video tutorials. The site has a whole range of video tutorials on SQL Server covering the basics right up to advanced topics.

I've seen the videos and they are a good introduction to the app and what you can do with it.

The videos are available here (requires subscription).

SQL Server 2008 TIME data type

The new TIME type stores a time with a specified scale that defines the fractional second precision.

The scale ranges from 0-7 representing 0-7 significant digits for the fractional seconds. The default precision is TIME(7), giving 7 significant digits, a range of .0000000 to .9999999.

TIME is stored as an integer of various sizes, depending on the scale. For a scale of 0-2 it is stored as a 3 byte integer, 3-4 a 4 byte integer, and for scale 5-7 it is stored as a 5 byte integer.

The scale is then used to calculate the time since midnight, with an accuracy ranging from 1 second to 100 nanoseconds.

If t is the value stored in the time column and n is the scale the time from midnight in seconds can be calculated by t / 10n.

Here’s a summary of the storage and scaling (seconds, milliseconds, and nanoseconds are the respective duration t is multiplied by):

Scale Storage (bytes) Seconds Milliseconds Nanoseconds
TIME(0) 3 1 1000 1000000000
TIME(1) 3 0.1 100 100000000
TIME(2) 3 0.01 10 10000000
TIME(3)  4 0.001 1 1000000
TIME(4)  4 0.0001 0.1 100000
TIME(5) 5 0.00001 0.01 10000
TIME(6) 5 0.000001 0.001 1000
TIME(7) 5 0.0000001 0.0001 100

It’s possible to extract the unscaled value from a TIME value, although it requires a few steps.

DECLARE @Time TIME(7) = '00:01:00' -- Format HH:mm:SS[.nnnnnnn]

DECLARE @BinaryTime VARBINARY(8)

 

SET @BinaryTime = SUBSTRING(CONVERT(VARBINARY, REVERSE(CONVERT(VARBINARY, @Time))),

                            1,

                            DATALENGTH(@Time))

                                               

SELECT CONVERT(BIGINT, @BinaryTime) -- Unscaled TIME value

-- Result: 600000000

The above example gives a result of 600000000, which, looking at the scale makes sense. The scale is 7, so a time of 1 minute past midnight is 60 seconds = 600000000 / 107.

DECLARE @Time TIME(3) = '00:01:00' -- Format HH:mm:SS[.nnnnnnn]

DECLARE @BinaryTime VARBINARY(8)

 

SET @BinaryTime = SUBSTRING(CONVERT(VARBINARY, REVERSE(CONVERT(VARBINARY, @Time))),

                            1,

                            DATALENGTH(@Time))

                                               

SELECT CONVERT(BIGINT, @BinaryTime) -- Unscaled TIME value

 

-- Result: 60000


A scale of 3 gives a result of 6000 as 60 seconds = 6000 / 103

Books Online has more information about the new DATE type here.

SQL Server 2008 DATE data type

SQL Server 2008 has several new data types, including new date and time types.  In a series of short posts I’ll go into how these data types are structured. All of these new types are supported in SQL Internals Viewer, and a new data type viewer is coming up in a future version of the app.

The new date and time data types are:

  • DATE – Stores a date value
  • TIME – Stores a time value with an accuracy of up to 100 nanoseconds
  • DATETIME2 – Stores a date and time value with the higher TIME accuracy
  • DATETIMEOFFSET – Stores a date and time value with a time zone offset

DATE type internals

The date type simply stores a date, ranging from January 1st 0001 (1 AD) to December 31st 9999. Internally the type is stored as a 3 byte (24-bit) integer. The integer value is the number of days since the base date of 01/01/0001.

It isn’t possible to convert from an INT to DATE directly. Running SELECT CONVERT(DATE, 1) will result in the following error:

Msg 529, Level 16, State 2, Line 1

Explicit conversion from data type int to date is not allowed.

However it is possible to convert to from INT to DATE by converting first to BINARY(3), reversing the bytes, and converting to DATE. (I’ll explain why you need the REVERSE and CONVERT in a subsequent post.)

This shows each increment of the 24-bit integer represents a day from the base date:

DECLARE @IntValue INT

 

SET @IntValue = 0

 

SELECT CONVERT(DATE, CONVERT(BINARY(3), REVERSE(CONVERT(BINARY(3), @IntValue))))

 -- Result: 0001-01-01

 

SET @IntValue = 1

 

SELECT CONVERT(DATE, CONVERT(BINARY(3), REVERSE(CONVERT(BINARY(3), @IntValue))))

 -- Result: 0001-01-02

 

SET @IntValue = 2

 

SELECT CONVERT(DATE, CONVERT(BINARY(3), REVERSE(CONVERT(BINARY(3), @IntValue))))

 -- Result: 0001-01-03

 

SET @IntValue = 3

 

SELECT CONVERT(DATE, CONVERT(BINARY(3), REVERSE(CONVERT(BINARY(3), @IntValue))))

 -- Result: 0001-01-04

Unlike DATE and SMALLDATETIME it doesn’t seem possible (with the November CTP) to add and subtract days from a date:

DECLARE @Date DATE = '2008-02-01'

SELECT @Date + 1

This results in the error: 

Msg 206, Level 16, State 2, Line 2

Operand type clash: date is incompatible with int

Adding two dates together also results in the following error:

Msg 8117, Level 16, State 1, Line 3

Operand data type date is invalid for add operator.

Books Online has more information about the new DATE type here.

Next up, the new TIME type.

FOR XML and back again

I’ve used the 2005 XML features for a few things now and I’m getting to quite like it. One thing I’ve found is that it is a lot easier to get data into XML than it is to get it back out again.

The following is a stored procedure that given an XML variable and a table name will dynamically construct and execute the SQL needed to shred XML back into a table. I knocked it together quite quickly and at the moment it only works on XML that has been created by FOR XML AUTO, although it could quite easily be modified for the other XML options.

CREATE PROC dbo.uSpShredTable @Xml XML, @TableName SYSNAME AS

       DECLARE @Sql NVARCHAR(MAX)

 

       SELECT @Sql = 'SELECT ' +

           STUFF((SELECT '      ,T.Data.value(''@' +

                         c.name + ''', ''' +

                         t.name +

                         CASE WHEN c.user_type_id IN (165,167,173,175,231,239)

                              THEN '(' + CONVERT(VARCHAR, c.max_length) + ')'

                              WHEN c.user_type_id IN (106, 108)

                              THEN '(' + CONVERT(VARCHAR, c.precision)

                                   + ', ' + CONVERT(VARCHAR, c.scale) + ')'

                              ELSE '' END +

                         ''') AS ' + c.name + CHAR(10)

                  FROM   sys.columns c

                         INNER JOIN sys.types t ON c.user_type_id = t.user_type_id

                  WHERE  object_id = OBJECT_ID(@TableName)

                         AND t.name !='xml'

                  FOR XML PATH('')), 1, 7, '') +