How To: BULK INSERT your User Defined Types (UDT)?

The task is to bulk load data from flat files into tables with UDT column. This blog post will show you the problem and the solution.

We will use the Point type as an example, which you can find the source code from the Books On Line (BOL), or from MSDN web, or from my blog file folder.

Suppose you've either compiled your source code into a DLL by using Visual Studio, or you downloaded it, the next step is to open SSMS, open a Query window, use TempDB or create a TEST database, then run these SQL statements:

CREATE ASSEMBLY Point

FROM 'D:\DongDoc\SQL Server Management Studio\Projects\UDT_Point\UDT_Point.dll'

GO;

 

CREATE TYPE Point

EXTERNAL NAME Point.[Point];

GO;

 

CREATE TABLE T1

(

Col1 VARCHAR(10) NULL,

Col2 Point NULL

)

GO;

 

Now we stuff our Table T1 with some value:

 

INSERT INTO dbo.T1 (Col1, Col2) VALUES ('TEST', CONVERT(Point, '3,4'));

INSERT INTO dbo.T1 (Col1, Col2) VALUES ('TEST', CONVERT(Point, '1,5'));

INSERT INTO dbo.T1 (Col1, Col2) VALUES ('TEST', CAST ('1,99' AS Point));

By doing CAST/CONVERT from string to UDT type, CLR Engine will call Point class' Parse(SqlString s) method. If now we do a SELECT *, the result is like:

Col1 Col2
---------- --------------------
TEST 0x008000000380000004
TEST 0x008000000180000005
TEST 0x008000000180000063

At this point, if you are a programmer, you should realized that UDT in the database is to persist object instances into Binary/Byte format. So these insert will success as well:

--INSERT Binary Representation of the UDT with type cast

INSERT INTO dbo.T1 (Col1, Col2) VALUES('TESTBS',

CAST(0x008000000380000004 AS Point));

 

--INSERT Binary Representation of the UDT without cast

INSERT INTO dbo.T1 (Col1, Col2) VALUES('TESTBS',

0x008000000380000004);

 Now we prepare a text file to try BULK INSERT:

Data1.txt (save as Tab Delimited)

------------

 

TEST 0x008000000380000004
TEST 0x008000000180000005
TEST 0x008000000180000063

The BULK INSERT statement:

--BULK INSERT

BULK INSERT dbo.T1 FROM

'D:\DongDoc\SQL Server Management Studio\Projects\UDT_Point\Data1.txt'

WITH(

DATAFILETYPE = 'char'

)

 This is the error msg you got:

Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 2 (Col2).

Believe me, this is not a format file issue, you'll get similar errors if you provide a format file. But this will work:

Data2.txt (save as Tab Delimited)

-------------

TEST 008000000380000004
TEST 008000000180000005
TEST 008000000180000063

This exactly proves how BULK INSERT works inside SQL server, i.e. BULK INSERT sends the parsed Rowset to Query Engine directly, won't be bothered to do anything else. (For a good explanation on BULK INSERT/BCP, please go to www.sqlmag.com, look for April 2004 issue, topic "Bulk Loading Data into SQL Server 2000".)

But our task, if you still remember, is to load a file like this:

Data3.txt (save as Tab Delimited, can't be a csv, right?)

--------------

TEST 3,4
TEST 1,5
TEST 1,99

 Please spend your time to play with this, see if you can get it BULK INSERTed, with or without format file. (In case you really got through, please let me know immediately, even it's 4am in the morning.)

 IT WON'T WORK!

The reason according to me: all bulk operations are designed for a single purpose, i.e. -- BULK operation. So BULK INSERT won't invoke Point.Parse(SqlString s) at all, it just waiting there for you to give it some binary to pass to the Engine to write to the data file without any further change. This is actually provable, by using Debug technique, e.g. set break point inside Parse(...) method, then fire up BULK statement inside SSMS.

 SO WHAT'S THE SOLUTION?

--solution:

INSERT INTO dbo.T1

SELECT TEM.Col1, CAST(TEM.Col2 AS Point)

FROM

OPENROWSET(

BULK 'D:\DongDoc\SQL Server Management Studio\Projects\UDT_Point\Data3.txt'

,FORMATFILE = 'D:\DongDoc\SQL Server Management Studio\Projects\UDT_Point\format.fmt')

AS TEM

 format.fmt (save as Tab Delimited)

----------------

 

9.0
2
1 SQLCHAR 0 10 "\t" 1 Col1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 20 "\r\n" 2 Col2 SQL_Latin1_General_CP1_CI_AS

 (note: the length of 20 is somewhat arbitrary).

 According to Sunil Agarwal of SQL Server Storage Engine Group at Microsoft, who is responsible for concurrency, indexes, tempdb, LOBS, supportability, and bulk import/export, this could be the only solution for our task. The original discussion with him can be found here. Also according to Sunil, this solution is still a streamed process, not a staged one. Yet beware the performance impact, for a million Point you have in the flat file, that's a million call to Point.Parse(), try write your code well.

 So our hero who comes to the rescue is: OPENROWSET(BULK) statement. Enjoy!

Published 27 November 2006 16:42 by dong
Filed under:

Comments

No Comments