SQL Server 2008 - Batch inserts and identity columns using MERGE - SimonS Blog on SQL Server Stuff

SQL Server 2008 - Batch inserts and identity columns using MERGE

Performing bulk inserts into a table when the table has an identity column has long bee a difficult challenge for client programming.

The difficulty has been getting the identity values that have been generated for the set of data you are saving. It becomes even more difficult when you want to save parent child relationships.

In SQL 2005 the OUTPUT clause was introduced which allowed you to return the identity values generated by a an insert of multiple rows (you can also generate any other data generated by the server i.e. defaults contrary to what BOL says). However there was one draw back. You couldn't relate the identity values to the original data set unless you had a natural key in your table or you stored a second surrogate key. This means you had to fully reload your client application data set. For large data sets thats not ideal.

This is because the OUTPUT clause for an INSERT statement can only reference the inserted pseudo table. This only contains the columns of the underlying table so you don't have access to any other data.

With SQL Server 2008 you get MERGE. MERGE is a combined INSERT, UPDATE and DELETE statement that supports the OUTPUT clause. I thought I would have a look and see if the same limitations apply to the OUTPUT claue as they do for the INSERT statement. Well they don't.

This means that you can return any value from your source data even if its not being inserted into the table. For example

CREATE TABLE Person (id int identity(1,1), name varchar(200))

 

MERGE INTO Person

USING (Values ('Simon',1),('Mark',2),('Stuart',3)) People(name,clientSideId)

ON 1=2

WHEN NOT MATCHED THEN

INSERT (name) values (People.Name)

OUTPUT People.clientSideId, Inserted.Id;

This allows you to update your client code with the idenity values generated by the server. Whilst this logically shold work, in CTP5 there is a bug that causes this to fail. It appears that the issue is to do with the ON clause. When the expression is always false something is causing a severe error. There are a number of workarounds, sometimes using a WHEN MATCHED clause will help and in others changing he USING to be a table reference and not a derived table helps.

For an example of saving parent child relationships have a look at the code below. This copies the objects and columns system views into new tables, whilst maintaining correct referential integrity.

The use of the @objectTab and @columnTab is to avoid the error mentioned above, once fixed this code can be greatly simplfied.

You might be thinking that this is goig to such from a performance stance. It doesn't. Whilst the execution plan costs is much higher than the straight insert the actual cost of cpu and IO can be less than the direct INSERT approach. I beleieve this is due to some sorting and assertions that are imposed in the MERGE statement which simplify the storage and upating of the underlying indexes.

Here is the code mentioned above. It returns two data sets that can be used to update your client data sets. You could combine into 1 data set if required.

drop table NewObject

go

drop table NewColumn

go

create table NewObject (id int identity(1,1) check (id > 0 ), name sysname)

go

create table NewColumn (id int identity(1,1), ObjectId int, name sysname)

go

set nocount on

 

declare @objects xml = (

select top 1000 object_id [@object_id]

      ,name [@name]

      ,(select column_id [@id], name [@name] from sys.columns where sys.columns.object_id = sys.objects.object_id for xml path('column'), Type )

from sys.objects

for xml path ('object'))

 

declare @NewObject table (new_id int, old_id int, unique ([old_id]))

declare @id int

declare @NewColumn table (new_id int, old_id int, new_col_id int, old_col_id int)

 

declare @objectTab table (name sysname, object_id int)

insert into @objectTab (name, object_id)

       select obj.node.value('@name','sysname') name

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('/object') obj(node)

 

merge into NewObject

using ( select name, object_id from @objectTab

       )src(name,id)

on NewObject.id = 0

when matched then

delete

when not matched then

insert (name) values(src.name)

output src.id, inserted.id into @NewObject (old_id, new_id);

 

select * from @NewObject

 

declare @columnTab table (name sysname,id int,  object_id int)

insert into @columnTab (name, id, object_id)

       select col.node.value('@name','sysname') name

            , col.node.value('@id','int') id

            , obj.node.value('@object_id','int') object_id

         from @objects.nodes('object') obj(node)

         cross apply obj.node.nodes('column') col(node)

 

merge into NewColumn

using (select col.name, col.object_id , col.id , obj.new_id new_object_id

        from @columnTab col

        join @NewObject obj on col.object_id = obj.old_id) col(name,old_object_id,old_col_id, new_object_id)

on 1=2

when not matched then

insert (name, ObjectId) values(col.name, col.new_object_id)

output col.old_object_id, col.old_col_id, col.new_object_id, inserted.id into @NewColumn (old_id, old_col_id, new_id, new_col_id);

 

select * from @NewColumn

 

 



-
Published 31 December 2007 00:28 by simonsabin

Comments

No Comments