December 2007 - Posts

So you want to use the SqlGeometry and SqlGeogrpahy collections in .Net. I found the lack of friendly collection based properties and methods suprising, i.e. no iterators, no collections which you can add to.(

What this means is you can't create a line and then simply extend it. Or even extend a polygon to cover another point (easily) i.e. myPolygon.AddPoint(new Point(x,y)). This is sort of understadable as there are many ways the point could be added, to the exterior ring, or extend the polygon to cover the point (a pit like pulling an elastic band)

I've found that you either have to create your string representations of Geometry or use the WKB format. As these are standards based they shoud be fine. I think the safest is to use WKB but you need to know the binary structure

If you want to understand the format required for WKB the easiest to do use reflector against the Microsoft.SqlServer.Types dll.

What I've learnt so far is that the format is,

Point = [Byte order][Type][X value][Y value]

MultiPoint = [Byte order][Type][PointCount]<Point1><Point2>

Line = [Byte order][Type][PointCount][X Value][YValue][X value][Y value].....

MultiLine = [Byte order][Type][LineCount]<Line1><Line2> .....

Polygon = [Byte order][Type][Polygon Count][PointCount][X Value][YValue][X value][Y value].....

MultiPolygon = [Byte order][Type][Polygon Count]<Polygon1><Polygon2>....

Geometry Collection = [Byte order][Type][Shape Count]<Shape1><Shape2>.....


[Byte Order] is a singe byte indicating the endian or the rest of the bytes i.e. is 1 0x01000000 or 0x00000001

Type is 4 byte unsigned integer indicating the type of shape

   GeometryCollection = 7,
    LineString = 2,
    MultiLineString = 5,
    MultiPoint = 4,
    MultiPolygon = 6,
    Point = 1,
    Polygon = 3,
    Unknown = 0
PointCount, LineCount and polygon Count are unsigned integers and are used to iterate through the remainder of the bytes.

X Value and Y Value are 8 byte floating points.

<Shape> are repetitions of the relevant shape specification.

Creating a WKB in TSQL is quite neat see

Unfortunately the SqlGeometry and SqlGeograohy types are sealed so you can't inherit them. You would have to have a local instance and repeat all the method calls. You would have to implement your own parser to be able to add points and this would suck from a perf perspective, although very little validation is done when you create a geometry.


Performing bulk inserts into a table when the table has an identity column has long bee a difficult challenge for client programming.

The difficulty has been getting the identity values that have been generated for the set of data you are saving. It becomes even more difficult when you want to save parent child relationships.

In SQL 2005 the OUTPUT clause was introduced which allowed you to return the identity values generated by a an insert of multiple rows (you can also generate any other data generated by the server i.e. defaults contrary to what BOL says). However there was one draw back. You couldn't relate the identity values to the original data set unless you had a natural key in your table or you stored a second surrogate key. This means you had to fully reload your client application data set. For large data sets thats not ideal.

This is because the OUTPUT clause for an INSERT statement can only reference the inserted pseudo table. This only contains the columns of the underlying table so you don't have access to any other data.

With SQL Server 2008 you get MERGE. MERGE is a combined INSERT, UPDATE and DELETE statement that supports the OUTPUT clause. I thought I would have a look and see if the same limitations apply to the OUTPUT claue as they do for the INSERT statement. Well they don't.

This means that you can return any value from your source data even if its not being inserted into the table. For example

CREATE TABLE Person (id int identity(1,1), name varchar(200))



USING (Values ('Simon',1),('Mark',2),('Stuart',3)) People(name,clientSideId)

ON 1=2


INSERT (name) values (People.Name)

OUTPUT People.clientSideId, Inserted.Id;

This allows you to update your client code with the idenity values generated by the server. Whilst this logically shold work, in CTP5 there is a bug that causes this to fail. It appears that the issue is to do with the ON clause. When the expression is always false something is causing a severe error. There are a number of workarounds, sometimes using a WHEN MATCHED clause will help and in others changing he USING to be a table reference and not a derived table helps.

For an example of saving parent child relationships have a look at the code below. This copies the objects and columns system views into new tables, whilst maintaining correct referential integrity.

The use of the @objectTab and @columnTab is to avoid the error mentioned above, once fixed this code can be greatly simplfied.

You might be thinking that this is goig to such from a performance stance. It doesn't. Whilst the execution plan costs is much higher than the straight insert the actual cost of cpu and IO can be less than the direct INSERT approach. I beleieve this is due to some sorting and assertions that are imposed in the MERGE statement which simplify the storage and upating of the underlying indexes.

Here is the code mentioned above. It returns two data sets that can be used to update your client data sets. You could combine into 1 data set if required.

drop table NewObject


drop table NewColumn


create table NewObject (id int identity(1,1) check (id > 0 ), name sysname)


create table NewColumn (id int identity(1,1), ObjectId int, name sysname)


set nocount on


declare @objects xml = (

select top 1000 object_id [@object_id]

      ,name [@name]

      ,(select column_id [@id], name [@name] from sys.columns where sys.columns.object_id = sys.objects.object_id for xml path('column'), Type )

from sys.objects

for xml path ('object'))


declare @NewObject table (new_id int, old_id int, unique ([old_id]))

declare @id int

declare @NewColumn table (new_id int, old_id int, new_col_id int, old_col_id int)


declare @objectTab table (name sysname, object_id int)

insert into @objectTab (name, object_id)

       select obj.node.value('@name','sysname') name

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('/object') obj(node)


merge into NewObject

using ( select name, object_id from @objectTab


on = 0

when matched then


when not matched then

insert (name) values(

output, into @NewObject (old_id, new_id);


select * from @NewObject


declare @columnTab table (name sysname,id int,  object_id int)

insert into @columnTab (name, id, object_id)

       select col.node.value('@name','sysname') name

            , col.node.value('@id','int') id

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('object') obj(node)

         cross apply obj.node.nodes('column') col(node)


merge into NewColumn

using (select, col.object_id , , obj.new_id new_object_id

        from @columnTab col

        join @NewObject obj on col.object_id = obj.old_id) col(name,old_object_id,old_col_id, new_object_id)

on 1=2

when not matched then

insert (name, ObjectId) values(, col.new_object_id)

output col.old_object_id, col.old_col_id, col.new_object_id, into @NewColumn (old_id, old_col_id, new_id, new_col_id);


select * from @NewColumn



Posted by simonsabin | with no comments

Quite a lot it seems. The new SQL Server 2008 feature Declaratrive Management Framework (DMF) is no more. It has been renamed to Policy Based Management through what looks likj a fairly complex process.

I could understand this for a whole program but for a feature it seems quite excessive.

You can read about the process from Dan here.

Posted by simonsabin | with no comments
Filed under:

The spatial functionality is hugely standards based. There spatial standards are well established what this means is that there are many methods to support the standards. One of the areas that you will see this is the creation of spatial data.

To create an instance of a spatial type you have many many options, most use the methods on the base types, i.e. geometry::STGeomFromText.

This is the set of ways of creating a point,

declare @p geometry;


set @p = geometry::Point(10,10,0);

set @p = geometry::STGeomFromText('POINT(10 10)',0);

set @p = geometry::STPointFromText('POINT(10 10)',0);

set @p = geometry::STGeomFromWKB(0x010100000000000000000024400000000000002440,0);

set @p = geometry::STPointFromWKB(0x010100000000000000000024400000000000002440,0);

set @p = geometry::GeomFromGml('<Point xmlns=""><pos>10 10</pos></Point>',0);

set @p = geometry::Parse('POINT(10 10)');

set @p = 'POINT(10 10)';

select @p

Only the point sub type has its own specific method geometry::Point. This I guess is because the attributes are well defined. You have one X, one Y and a SRID (Spatial Reference ID), where as the other types you have many Xs and many Ys.

Let me explain a few of these.

The STGeom... and STPoint... are the same expect the later verifies that the geometry you pass in is a Point.

The ..Text methods use the Well-Known Text (WKT) format as input and the ...WKB use the Well-Known Binary (WKB) format. You can use the STAsText method to get the text (WKT) representation of a variable and STAsBinary to find the binary (WKB) represenation

The GeomFromGml is an extension to the methods specified in the OGC and uses the Geography Markup Language (A subset for SQL Server) as input. You can use the AsGml method to get the GML representation . I would personally avoid GML representations due to the bulkiness of XML and overhead or associated processing.

Note: In BOL the method is GeomFromGML. This is incorrect only GeomFromGml works (notice the case of GML) this is consistent with the AsGml method, but contrary to the WKB methods. Who knows what CTP6 or RTM will bring.

You will undoubtedly see most examples using the text based methods because they are more readable. However on on doing some testing it seems the binary methods are quicker (in some tests twice a quick). This makes some sense as each point coordinate can be read as a single entity (8 byte float) where as the text representation has to be read character by character and then creating the value once a boundary (space, comma or bracket) has been reached.

It is also easier to create binary values in TSQL

geometry::STPolyFromWKB(0x00+0x00000003+0x00000001 + 0x00000005

                      + cast(@minx as binary(8)) + cast(@miny as binary(8))

                      + cast(@maxx as binary(8)) + cast(@miny as binary(8))

                      + cast(@maxx as binary(8)) + cast(@maxy as binary(8))

                      + cast(@minx as binary(8)) + cast(@maxy as binary(8))

                      + cast(@minx as binary(8)) + cast(@miny as binary(8)) ,0)

This creates a simple 4 sided polygon (aka a square). Key thing to note is the first two bytes that specify the byte order which for TSQL binary functions you want 0 and not 1 (most examples use 1.


For a list of spatial references and their definitions including units of measure have a look at the system view

select * from sys.spatial_reference_systems

Posted by simonsabin | 2 comment(s)
Filed under: ,

A friend recently emailed me after trying out the new spatial stuff in Katmai asking why the distance from Beaverton to Boston was 51, and any way 51 what.

On looking at his code I realised he had used the Geometry type and not the Geography type. The former is used to work on a single planar surface measured in x and y. The latter is for doing spatial stuff on spheres i.e. the earth.

He had passed the longitude and latitude to the geometry type and done a STDistance call to find the distance. Even though he had speciifed a spatial reference for WGS84 the answer came back as 51. Which was simply the hypotenuse of the triangle using straight forward pythagorus.

The Spatial reference only applies when using the Geography data type which deals with the elipitcal word. So why do we need spatial references anyway. Well they define the world in which you are working, i.e. what is 0, how big is the world etc, and what unit of measure is to be used.

If you want a nice explanation of spatial references have a read of the Mortens post here

This might be confusing and you may ask why do we need it for the Geometry type. The key is the unit of measure, If I take some spatial data describing a house from the UK over to building site in America and combine the data if I don't have the data stamped with a spatial reference I will end up in all sorts of mess. Why?

Units of measure.

If I measure something in feet and inches and then cobine with measurements in metres. Without knowing they are using different measurements I will end up with the wrong result. Thats why in SQL if you compare or perform an operation on two geometry (or geography) values with different spatial references (SRID) you will end up with a NULL answer.

It would have been nice for the types have been able to convert between the Spatial  References for at least the geometry types.


Posted by simonsabin | with no comments
Filed under: , ,

If you got a question about the spatial stuff in SQL Server 2008 make sure you visit the msd Spatial forum

The guys from the team that developed it hang out there so you should get an answer pretty quick.


The ifilters needed to index office 2007 documents are now available for download from here

At the time of writing some of the related KB articles aren't available.

If you use full text and need to index office 2007 documents then you need these iFilters installed and registered


One gripe about SQL 2005 was that read80trace wasn't available for it.

That has now been rectified with the release of RML Utilities for SQL Server.

This is an application that has a set of utilities for reading, analysing and replaying trace files. This is essential when doing any sort of analysis of a SQL Server system.

I've had a quick look and it looks very sound, I look forward to spending some more time investingating it in depth and see how it compares to some of the other tools out in the market place.


We really want more speakers from the community for SQLBits 2008 (

If you want to present a session in March then let us know, you have until the end of this week. We will be opening the voting in the New Year. 

To submit a session send an email to with the following details,

  • Name
  • Company Name
  • Bio ~ 200 words
  • Contact Email
  • Contact Mobile
  • Where you live if not in the UK
  • Track for your session (BI, Dev, DBA)
  • Session Title
  • Session Abstract
  • Level of session (100 introduction - 400 in depth)

If you want to see examples of what was submitted last time have a look at

Posted by simonsabin | with no comments
Filed under: ,
More Posts Next page »