This blog will no longer be updated.

New content is available on my new blog

BCP and numeric data field with scientific notation - Piotr Rodak

BCP and numeric data field with scientific notation

There is a known issue in SQL Server 2005 with importing data using bcp.exe or BULK INSERT methods from character files that contain numeric values written using scientific notation, like 2.044E10. It was not a problem in versions prior to 2005 because bcp for SQL Server 200 and 7.0 converted such values implicitly. Beginning with SQL Server 2005, BCP follows the same rules when converting data from input files as CONVERT does. Unfortunately, CONVERT doesn't understand scientific notation if it is passed as a string. Check this out:

select convert ( decimal ( 18 , 6 ), 2.6944 E- 01 ) -- this works

select convert ( decimal ( 18 , 6 ), '2.6944E-01' ) --this doesn't

Msg 8114, Level 16, State 5, Line 1

Error converting data type varchar to numeric .

 

This error becomes especially bothersome when you have data feeds in character format, that contain fields written in scientific notation.  Let's create a table that we will populate with data from a file:

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[IndexComponentsTypeChecked](

[RowID] [int] NOT NULL ,

[IndexID] [varchar]( 11 ) NULL ,

[ComponentId] [varchar]( 11 ) NULL ,

[ComponentCurrency] [char]( 3 ) NULL ,

[NumberOfItems] [decimal]( 18 , 6 ) NULL ,

[ComponentOrigprice] [decimal]( 18 , 6 ) NULL ,

[ComponentWeight] [decimal]( 18 , 6 ) NULL ,

[IndexType] [varchar]( 12 ) NULL ,

[IndexDivisor] [decimal]( 18 , 6 ) NULL

) ON [PRIMARY]

 

 Now Books On Line says that if you have columns with scientific notation in your file, you should specify float data type in your format file. I prepared test file along with several format files that define data format for above table. I attached all files used in this post in bcptests.zip file for your convenience.

The file testfeed.txt contains single line of data, containing values for all columns of the table IndexComponentsTypeChecked .

15564 TEST.INDX TSTSTR. 0001 EUR 0.9 0.0 2.6944 E- 01 INDEX 1.0

As you see, column ComponentWeight , defined as decimal(18, 6) is populated with value written using scientific notation.

First, let's try what happens when we don't use any format file.

bulk insert dbo.IndexComponentsTypeChecked from 'c:\Projects\BulkTest\testfeed.txt'

The output is as follows:

Msg 4864, Level 16, State 1, Line 7

Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 7 (ComponentWeight).

  

Following the documentation, you can prepare a format file thatyou will pass to the BULK INSERT statement to overcome the conversion error. Here is line from the format file bcpcfloat.fmt :

7 SQLFLT8 0 41 "\t" 7 ComponentWeight ""

As you can see, the data type for the column has been defined as SQLFLT8, which is mapped to float t-sql data type.

The query loading data from the data file is looking now like this:

bulk insert dbo.IndexComponentsTypeChecked from 'c:\Projects\BulkTest\testfeed.txt' with (formatfile = 'c:\Projects\BulkTest\bcpcfloat.fmt' )

If you run above query, it will work, because 2.6944E-01 is converted to float and then to decimal(18,6). Seems that the problem is solved, isn't it? Unfortunately, I discovered that this doesn't work if value is negative. The file testfeednegative.txt contains the same data, but ComponentWeight column is populated with negative value this time:

15564 TEST.INDX TSTSTR. 0001 EUR 0.9 0.0 -2.6944E-01 INDEX 1.0

The following query now returns error:

bulk insert dbo.IndexComponentsTypeChecked from 'c:\Projects\BulkTest\testfeednegative.txt' with (formatfile = 'c:\Projects\BulkTest\bcpcfloat.fmt' )

Msg 8115, Level 16, State 6, Line 5

Arithmetic overflow error converting float to data type numeric .

Interestingly enough, CONVERT will work in this case:

select convert ( float , -2.6944E-01 )

There is another workaround (for the scientific notation workaround).You can use XML format files. While they basically contain the same formatting information, they cause BCP to behave in a different way. XML format files are more verbose than legacy format files, but also easier to understand the processing of data by bulk operations using them. Note that you can generate xml (and non-xml) format files using bcp utility.

This is the xml format file bcp.xml:

<?xml version="1.0"?>

<BCPFORMAT xmlns =" http://schemas.microsoft.com/sqlserver/2004/bulkload/format " xmlns:xsi =" http://www.w3.org/2001/XMLSchema-instance ">

<RECORD>

<FIELD ID =" 1 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 12 "/>

<FIELD ID =" 2 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 11 " COLLATION =" SQL_Latin1_General_CP1_CI_AS "/>

<FIELD ID =" 3 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 11 " COLLATION =" SQL_Latin1_General_CP1_CI_AS "/>

<FIELD ID =" 4 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 3 " COLLATION =" SQL_Latin1_General_CP1_CI_AS "/>

<FIELD ID =" 5 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 41 "/>

<FIELD ID =" 6 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 41 "/>

<FIELD ID =" 7 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 41 "/>

<FIELD ID =" 8 " xsi:type =" CharTerm " TERMINATOR =" \t " MAX_LENGTH =" 12 " COLLATION =" SQL_Latin1_General_CP1_CI_AS "/>

<FIELD ID =" 9 " xsi:type =" CharTerm " TERMINATOR =" \r\n " MAX_LENGTH =" 41 "/>

</RECORD>

<ROW>

<COLUMN SOURCE =" 1 " NAME =" RowID " xsi:type =" SQLINT "/>

<COLUMN SOURCE =" 2 " NAME =" IndexID " xsi:type =" SQLVARYCHAR "/>

<COLUMN SOURCE =" 3 " NAME =" ComponentId " xsi:type =" SQLVARYCHAR "/>

<COLUMN SOURCE =" 4 " NAME =" ComponentCurrency " xsi:type =" SQLCHAR "/>

<COLUMN SOURCE =" 5 " NAME =" NumberOfItems " xsi:type =" SQLDECIMAL " PRECISION =" 18 " SCALE =" 6 "/>

<COLUMN SOURCE =" 6 " NAME =" ComponentOrigprice " xsi:type =" SQLDECIMAL " PRECISION =" 18 " SCALE =" 6 "/>

<COLUMN SOURCE =" 7 " NAME =" ComponentWeight " xsi:type =" SQLFLT8 "/>

<COLUMN SOURCE =" 8 " NAME =" IndexType " xsi:type =" SQLVARYCHAR "/>

<COLUMN SOURCE =" 9 " NAME =" IndexDivisor " xsi:type =" SQLDECIMAL " PRECISION =" 18 " SCALE =" 6 "/>

</ROW>

</BCPFORMAT>


As you see, other decimal(18, 6) columns are processed correctly unless they contain scientific notation data.

The following query succeeds:

bulk insert dbo.IndexComponentsTypeChecked from 'c:\Projects\BulkTest\testfeednegative.txt'

with (formatfile = 'c:\Projects\BulkTest\bcp.xml' )

 

I wondered, what happens when I change the destination type to float? I altered table IndexComponentsTypeChecked using the following query:

alter table [IndexComponentsTypeChecked] alter column ComponentWeight float null

GO

bulk insert dbo.IndexComponentsTypeChecked from 'c:\Projects\BulkTest\testfeednegative.txt' with (formatfile = 'c:\Projects\BulkTest\bcpc1.fmt' )

This time, even negative value was processed correctly!

So, it looks like to properly process numeric columns with potentially negative scientific notation values you have to use XML format files if you have to use character data files. If you can use native data files the problem doesn't exists, as all numbers are stored as binary values in the data file. I created data file datanative.txt and format file bcpnative.fmt so you can investigate their structure.

There are still organizations that have large systems based on SQL Server 2000. Usually such systems process large number of feeds, many of them in character format. The change of behavior of bcp and especially bug with negative values can add a lot of work during migration of the systems. Perhaps Microsoft should think about making CONVERT understand string scientific notation values, this might save major headaches for many developers and project managers.

 

Attachment: bcptests.zip
Published 17 May 2009 00:49 by Piotr Rodak

Comments

No Comments