Monday, May 5, 2008 9:52 AM tonyrogerson

Using the OUTPUT clause on INSERT when loading data using OPENROWSET and BULK operator

Say you have an external file you want to load into your SQL Server database; you need to load – verify – cleanse and insert and at the same time keep a cross reference between the original source file records and the inserted database rows.

The OUTPUT clause is an ideal candidate to help you with this cause; most people use the IDENTITY property for surrogate keys; the OUTPUT clause allows you to get at the value returned by the IDENTITY for each row you’ve just inserted; unlike SCOPE_IDENTITY() which only returns the last inserted IDENTITY value the OUTPUT clause gives you all the rows inserted!

Example

Take an input text file (testinp.txt) that simply contains a number of records separated by carriage return line feed eg...

My
Test
File
Is
Here

We can use the BULK operator of the OPENROWSET structure to load this file using a format file defined below:

9.0
1
1 SQLCHAR 0 50 "\r\n" 1 somedata Latin1_General_CI_AS

Note, it is important to name the destination column and give it a collation otherwise you get this really (that’s sarcasm) descriptive error:

Msg 4862, Level 16, State 1, Line 1

Cannot bulk load because the file "c:\temp\testinp.fmt" could not be read. Operating system error code (null).


We need two tables, one is the base table we are inserting into and the second is the talbe that will capture what the IDENTITY value was and which row from the raw file it was set against.

 

create table stage_table_base (

    id  int not null identity,

    myinputdata varchar(100) not null

)

create table stage_table_captured (

    id  int not null,

    myinputdata varchar(100) not null

)

 

We now insert into the base table combining the OPENROWSET and the OUTPUT clause to capture the IDENTITY value set...

 

insert stage_table_base( myinputdata )

    output inserted.id, inserted.myinputdata into stage_table_captured( id, myinputdata )

    select somedata

    from OPENROWSET(BULK N'C:\temp\testinp.txt', FORMATFILE = 'c:\temp\testinp.fmt' ) AS a


The above statement is literally doing two INSERT’s for us, the first INSERT is into the stage_table_base and the second is into the stage_table_captured table.

 

The OUTPUT clause gives us access to only the columns on either the INSERTED or DELETED internally generated system tables (think of triggers).

 

One important thing to note is that BULK INSERT and the BULK operator are not guarenteed to read or return the file in order, until now (SQL 2005 SP2 hotifx 3159) it has and frankly probably will into SQL 2008 but it’s a behaviour not documented and when pressed Microsoft have told me not to rely on the behaviour going forward.

 

If you want an order then you have to use ORDER BY on the select which then returns the data as a row by row cursor rather than a set so the insert will indeed be in the order you specify.

 

Realistically in a migration type scenario you’d have a third table that holds the raw loaded data and a number of columns specifying if the row had been rejected because of data validation problems, the file name source for the data etc... but that is a whole new topic and I need to research some other stuff for my presentation next weekend in Glasgow at the Scottish Developers . May be I’ll take a look on the 4 hour train journey!

 

Filed under: ,

Comments

# Interesting Finds: May 5, 2008

Monday, May 5, 2008 2:41 PM by Jason Haley

# Wot no SCOPE_GUID() function??

Monday, June 8, 2009 4:05 PM by Martin Bell UK SQL Server MVP

I’ve talked about the problems of using GUIDs as key values, and some of the issues that you may come

# Wot no SCOPE_GUID() function??

Tuesday, June 9, 2009 12:06 AM by Martin Bell UK SQL Server MVP

When you have a database generated GUID value how do you find out what it is?

# Returning GUID values to ADO.NET clients

Friday, November 4, 2011 8:51 PM by Martin Bell UK SQL Server MVP

My previous post showed how to return a database generated GUID as a stored procedure parameter, but if more than one GUID is generated what do you do?