December 2007 - Posts
So you want to use the SqlGeometry and SqlGeogrpahy collections in
.Net. I found the lack of friendly collection based properties and methods
suprising, i.e. no iterators, no collections which you can add to.(http://sqlblogcasts.com/blogs/simons/archive/2007/12/09/Katmai-Spatial---First-thoughts.aspx)
What this means is you can't create a line and then simply extend
it. Or even extend a polygon to cover another point (easily) i.e.
myPolygon.AddPoint(new Point(x,y)). This is sort of understadable as there are
many ways the point could be added, to the exterior ring, or extend the polygon
to cover the point (a pit like pulling an elastic band)
I've found that you either have to create your string
representations of Geometry or use the WKB format. As these are standards based
they shoud be fine. I think the safest is to use WKB but you need to know the
binary structure
If you want to understand the format required for WKB the easiest
to do use reflector against the Microsoft.SqlServer.Types dll.
What I've learnt so far is that the format is,
Point = [Byte order][Type][X value][Y
value]
MultiPoint = [Byte
order][Type][PointCount]<Point1><Point2>
Line = [Byte order][Type][PointCount][X Value][YValue][X value][Y
value].....
MultiLine = [Byte
order][Type][LineCount]<Line1><Line2> .....
Polygon = [Byte order][Type][Polygon
Count][PointCount][X Value][YValue][X value][Y value].....
MultiPolygon = [Byte order][Type][Polygon
Count]<Polygon1><Polygon2>....
Geometry Collection = [Byte order][Type][Shape
Count]<Shape1><Shape2>.....
where
[Byte Order] is a singe byte indicating
the endian or the rest of the bytes i.e. is 1 0x01000000 or
0x00000001
Type is 4 byte unsigned integer indicating the type of shape
GeometryCollection = 7,
LineString = 2,
MultiLineString = 5,
MultiPoint = 4,
MultiPolygon = 6,
Point = 1,
Polygon = 3,
Unknown = 0
PointCount, LineCount and polygon Count are unsigned
integers and are used to iterate through the remainder of the bytes.
X Value and Y Value are 8 byte floating points.
<Shape> are repetitions of the relevant shape
specification.
Creating a WKB in TSQL is quite neat see http://sqlblogcasts.com/blogs/simons/archive/2007/12/29/SQL-Server-2008-Spatial---How-many-ways-to-make-your-point-.aspx
Unfortunately the SqlGeometry and SqlGeograohy types are sealed so
you can't inherit them. You would have to have a local instance and repeat all
the method calls. You would have to implement your own parser to be able to add
points and this would suck from a perf perspective, although very little
validation is done when you create a geometry.
-
Performing bulk inserts into a table when the table has
an identity column has long bee a difficult challenge for client
programming.
The difficulty has been getting the identity values that have been generated
for the set of data you are saving. It becomes even more difficult when you want
to save parent child relationships.
In SQL 2005 the OUTPUT clause was introduced which allowed you to return the
identity values generated by a an insert of multiple rows (you can also generate
any other data generated by the server i.e. defaults contrary to what BOL says).
However there was one draw back. You couldn't relate the identity values to the
original data set unless you had a natural key in your table or you stored a
second surrogate key. This means you had to fully reload your client application
data set. For large data sets thats not ideal.
This is because the OUTPUT clause for an INSERT statement can only reference
the inserted pseudo table. This only contains the columns of the underlying
table so you don't have access to any other data.
With SQL Server 2008 you get MERGE. MERGE is a combined INSERT, UPDATE and
DELETE statement that supports the OUTPUT clause. I thought I would have a look
and see if the same limitations apply to the OUTPUT claue as they do for the
INSERT statement. Well they don't.
This means that you can return any value from your source data even if its
not being inserted into the table. For example
CREATE
TABLE Person (id int identity(1,1), name varchar(200))
MERGE
INTO Person
USING
(Values ('Simon',1),('Mark',2),('Stuart',3))
People(name,clientSideId)
ON
1=2
WHEN
NOT MATCHED THEN
INSERT
(name) values
(People.Name)
OUTPUT
People.clientSideId, Inserted.Id;
This allows you to update your client code with the idenity values generated
by the server. Whilst this logically shold work, in CTP5 there is a bug that
causes this to fail. It appears that the issue is to do with the ON clause. When
the expression is always false something is causing a severe error. There are a
number of workarounds, sometimes using a WHEN MATCHED clause will help and in
others changing he USING to be a table reference and not a derived table
helps.
For an example of saving parent child relationships have a look at the code
below. This copies the objects and columns system views into new tables, whilst
maintaining correct referential integrity.
The use of the @objectTab and @columnTab is to avoid the error mentioned
above, once fixed this code can be greatly simplfied.
You might be thinking that this is goig to such from a performance stance. It
doesn't. Whilst the execution plan costs is much higher than the straight insert
the actual cost of cpu and IO can be less than the direct INSERT approach. I
beleieve this is due to some sorting and assertions that are imposed in the
MERGE statement which simplify the storage and upating of the underlying
indexes.
Here is the code mentioned above. It returns two data sets that can be used
to update your client data sets. You could combine into 1 data set if
required.
drop
table NewObject
go
drop
table NewColumn
go
create
table NewObject (id int identity(1,1) check
(id > 0 ), name sysname)
go
create
table NewColumn (id int identity(1,1), ObjectId int, name
sysname)
go
set
nocount on
declare
@objects xml = (
select
top 1000 object_id
[@object_id]
,name [@name]
,(select column_id [@id], name [@name]
from sys.columns where sys.columns.object_id = sys.objects.object_id
for xml
path('column'), Type )
from
sys.objects
for
xml path ('object'))
declare
@NewObject table (new_id int, old_id int, unique ([old_id]))
declare
@id int
declare
@NewColumn table (new_id int, old_id int,
new_col_id int, old_col_id int)
declare
@objectTab table (name sysname, object_id
int)
insert
into @objectTab (name,
object_id)
select obj.node.value('@name','sysname')
name
, obj.node.value('@object_id','int') object_id
from @objects.nodes('/object') obj(node)
merge
into NewObject
using
( select name, object_id from @objectTab
)src(name,id)
on
NewObject.id = 0
when
matched then
delete
when
not matched then
insert
(name) values(src.name)
output src.id, inserted.id into
@NewObject (old_id, new_id);
select
* from @NewObject
declare
@columnTab table (name sysname,id int, object_id
int)
insert
into @columnTab (name, id,
object_id)
select col.node.value('@name','sysname')
name
, col.node.value('@id','int') id
, obj.node.value('@object_id','int') object_id
from @objects.nodes('object') obj(node)
cross apply obj.node.nodes('column') col(node)
merge
into NewColumn
using
(select col.name, col.object_id , col.id ,
obj.new_id new_object_id
from @columnTab
col
join @NewObject obj on col.object_id = obj.old_id)
col(name,old_object_id,old_col_id, new_object_id)
on
1=2
when
not matched then
insert
(name, ObjectId) values(col.name,
col.new_object_id)
output col.old_object_id, col.old_col_id, col.new_object_id, inserted.id
into @NewColumn (old_id, old_col_id, new_id,
new_col_id);
select
* from @NewColumn
-
Quite a lot it seems. The new SQL Server 2008 feature
Declaratrive Management Framework (DMF) is no more. It has been renamed to
Policy Based Management through what looks likj a fairly complex process.
I could understand this for a whole program but for a feature it seems quite
excessive.
You can read about the process from Dan here. http://blogs.msdn.com/dtjones/archive/2007/12/22/declarative-management-is-out.aspx
-
The spatial functionality is hugely standards based.
There spatial standards are well established what this means is that there are
many methods to support the standards. One of the areas that you will see this
is the creation of spatial data.
To create an instance of a spatial type you have many many options, most use
the methods on the base types, i.e. geometry::STGeomFromText.
This is the set of ways of creating a point,
declare @p
geometry;
set @p = geometry::Point(10,10,0);
set @p = geometry::STGeomFromText('POINT(10 10)',0);
set @p = geometry::STPointFromText('POINT(10 10)',0);
set @p =
geometry::STGeomFromWKB(0x010100000000000000000024400000000000002440,0);
set @p = geometry::STPointFromWKB(0x010100000000000000000024400000000000002440,0);
set @p = geometry::GeomFromGml('<Point
xmlns="http://www.opengis.net/gml"><pos>10
10</pos></Point>',0);
set @p = geometry::Parse('POINT(10 10)');
set @p = 'POINT(10 10)';
select
@p
Only the point sub type has its own specific method geometry::Point. This I
guess is because the attributes are well defined. You have one X, one Y and a
SRID (Spatial Reference ID), where as the other types you have many Xs and many
Ys.
Let me explain a few of these.
The STGeom... and STPoint... are the same expect the later verifies that the
geometry you pass in is a Point.
The ..Text methods use the Well-Known Text (WKT) format as input and the
...WKB use the Well-Known Binary (WKB) format. You can use the STAsText method
to get the text (WKT) representation of a variable and STAsBinary to find the
binary (WKB) represenation
The GeomFromGml is an extension to the methods specified in the OGC and uses
the Geography Markup Language (A subset for SQL Server) as input. You can use
the AsGml method to get the GML representation . I would personally avoid GML
representations due to the bulkiness of XML and overhead or associated
processing.
Note: In BOL the method is GeomFromGML. This is incorrect only
GeomFromGml works (notice the case of GML) this is consistent with the AsGml
method, but contrary to the WKB methods. Who knows what CTP6 or RTM will
bring.
You will undoubtedly see most examples using the text based methods because
they are more readable. However on on doing some testing it seems the binary
methods are quicker (in some tests twice a quick). This makes some sense as each
point coordinate can be read as a single entity (8 byte float) where as the text
representation has to be read character by character and then creating the value
once a boundary (space, comma or bracket) has been reached.
It is also easier to create binary values in TSQL
geometry::STPolyFromWKB(0x00+0x00000003+0x00000001 +
0x00000005
+ cast(@minx as binary(8)) + cast(@miny as binary(8))
+ cast(@maxx as binary(8)) + cast(@miny as binary(8))
+ cast(@maxx as binary(8)) + cast(@maxy as binary(8))
+ cast(@minx as binary(8)) + cast(@maxy as binary(8))
+ cast(@minx as binary(8)) + cast(@miny as binary(8)) ,0)
This creates a simple 4 sided polygon (aka a square). Key thing to note is
the first two bytes that specify the byte order which for TSQL binary functions
you want 0 and not 1 (most examples use 1.
-
For a list of spatial references and their definitions
including units of measure have a look at the system view
select * from sys.spatial_reference_systems
-
A friend recently emailed me after trying out the new
spatial stuff in Katmai asking why the distance from Beaverton to Boston was 51,
and any way 51 what.
On looking at his code I realised he had used the Geometry type and not the
Geography type. The former is used to work on a single planar surface measured
in x and y. The latter is for doing spatial stuff on spheres i.e. the earth.
He had passed the longitude and latitude to the geometry type and done a
STDistance call to find the distance. Even though he had speciifed a spatial
reference for WGS84 the answer came back as 51. Which was simply the hypotenuse
of the triangle using straight forward pythagorus.
The Spatial reference only applies when using the Geography data type which
deals with the elipitcal word. So why do we need spatial references anyway. Well
they define the world in which you are working, i.e. what is 0, how big is the
world etc, and what unit of measure is to be used.
If you want a nice explanation of spatial references have a read of the
Mortens post here http://www.sharpgis.net/2007/05/05/SpatialReferencesCoordinateSystemsProjectionsDatumsEllipsoidsConfusing.aspx
This might be confusing and you may ask why do we need it for the Geometry
type. The key is the unit of measure, If I take some spatial data describing a
house from the UK over to building site in America and combine the data if I
don't have the data stamped with a spatial reference I will end up in all sorts
of mess. Why?
Units of measure.
If I measure something in feet and inches and then cobine with measurements
in metres. Without knowing they are using different measurements I will end up
with the wrong result. Thats why in SQL if you compare or perform an
operation on two geometry (or geography) values with different spatial
references (SRID) you will end up with a NULL answer.
It would have been nice for the types have been able to convert between
the Spatial References for at least the geometry types.
-
If you got a question about the spatial stuff in SQL
Server 2008 make sure you visit the msd Spatial forum http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1
The guys from the team that developed it hang out there so you should get an
answer pretty quick.
-
The ifilters needed to index office 2007 documents are
now available for download from here
http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en
At the time of writing some of the related KB articles aren't available.
If you use full text and need to index office 2007 documents then you need
these iFilters installed and registered
-
One gripe about SQL 2005 was that read80trace wasn't
available for it.
That has now been rectified with the release of RML Utilities for SQL
Server.
This is an application that has a set of utilities for reading, analysing and
replaying trace files. This is essential when doing any sort of analysis of a
SQL Server system.
I've had a quick look and it looks very sound, I look forward to spending
some more time investingating it in depth and see how it compares to some of the
other tools out in the market place.
http://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx
-
We really want more speakers from the community for
SQLBits 2008 (www.sqlbits.com).
If you want to present a session in March then let us know, you have
until the end of this week. We will be opening the voting in the New
Year.
To submit a session send an email to sessions@sqlbits.com with the following
details,
- Name
- Company Name
- Bio ~ 200 words
- Contact Email
- Contact Mobile
- Where you live if not in the UK
- Track for your session (BI, Dev, DBA)
- Session Title
- Session Abstract
- Level of session (100 introduction - 400 in depth)
If you want to see examples of what was submitted last time have a look at http://www.sqlbits.com/Events/SQLBits_October_2007/PublicSessions.aspx
-
More Posts
Next page »