There is a known issue in SQL Server 2005 with importing data using bcp.exe or BULK INSERT methods from character files that contain numeric values written using scientific notation, like 2.044E10. It was not a problem in versions prior to 2005 because bcp for SQL Server 200 and 7.0 converted such values implicitly. Beginning with SQL Server 2005, BCP follows the same rules when converting data from input files as CONVERT does. Unfortunately, CONVERT doesn't understand scientific notation if it is passed as a string. Check this out:
select
convert
(
decimal
(
18
,
6
),
2.6944
E-
01
)
-- this works
select
convert
(
decimal
(
18
,
6
),
'2.6944E-01'
)
--this doesn't
Msg 8114,
Level
16, State 5, Line 1
Error converting data type
varchar
to
numeric
.
This error becomes especially bothersome when you have data feeds in character format, that contain fields written in scientific notation. Let's create a table that we will populate with data from a file:
SET
ANSI_NULLS
ON
GO
SET
QUOTED_IDENTIFIER
ON
GO
CREATE
TABLE
[dbo].[IndexComponentsTypeChecked](
[RowID]
[int]
NOT
NULL
,
[IndexID]
[varchar](
11
)
NULL
,
[ComponentId]
[varchar](
11
)
NULL
,
[ComponentCurrency]
[char](
3
)
NULL
,
[NumberOfItems]
[decimal](
18
,
6
)
NULL
,
[ComponentOrigprice]
[decimal](
18
,
6
)
NULL
,
[ComponentWeight]
[decimal](
18
,
6
)
NULL
,
[IndexType]
[varchar](
12
)
NULL
,
[IndexDivisor]
[decimal](
18
,
6
)
NULL
)
ON
[PRIMARY]
Now
Books On Line says that if you have columns with scientific notation in your file, you should specify float data type in your format file. I prepared test file along with several format files that define data format for above table. I attached all files used in this post in bcptests.zip file for your convenience.
The file testfeed.txt contains single line of data, containing values for all columns of the table
IndexComponentsTypeChecked
.
15564
TEST.INDX
TSTSTR.
0001
EUR
0.9
0.0
2.6944
E-
01
INDEX
1.0
As you see, column
ComponentWeight
, defined as
decimal(18, 6)
is populated with value written using scientific notation.
First, let's try what happens when we don't use any format file.
bulk
insert
dbo.IndexComponentsTypeChecked
from
'c:\Projects\BulkTest\testfeed.txt'
The output is as follows:
Msg 4864,
Level
16, State 1, Line 7
Bulk
load
data conversion error (type mismatch
or
invalid
character
for
the specified codepage)
for
row 1,
column
7 (ComponentWeight).
Following the documentation, you can prepare a format file thatyou will pass to the BULK INSERT statement to overcome the conversion error. Here is line from the format file
bcpcfloat.fmt
:
7 SQLFLT8 0 41 "\t" 7 ComponentWeight ""
As you can see, the data type for the column has been defined as SQLFLT8, which is mapped to float t-sql data type.
The query loading data from the data file is looking now like this:
bulk
insert
dbo.IndexComponentsTypeChecked
from
'c:\Projects\BulkTest\testfeed.txt'
with
(formatfile
=
'c:\Projects\BulkTest\bcpcfloat.fmt'
)
If you run above query, it will work, because
2.6944E-01
is converted to float and then to decimal(18,6). Seems that the problem is solved, isn't it? Unfortunately, I discovered that this doesn't work if value is negative. The file testfeednegative.txt contains the same data, but ComponentWeight column is populated with negative value this time:
15564
TEST.INDX
TSTSTR.
0001
EUR
0.9
0.0
-2.6944E-01
INDEX
1.0
The following query now returns error:
bulk
insert
dbo.IndexComponentsTypeChecked
from
'c:\Projects\BulkTest\testfeednegative.txt'
with
(formatfile
=
'c:\Projects\BulkTest\bcpcfloat.fmt'
)
Msg 8115,
Level
16, State 6, Line 5
Arithmetic overflow error converting
float
to
data type
numeric
.
Interestingly enough, CONVERT will work in this case:
select
convert
(
float
,
-2.6944E-01
)
There is another workaround (for the scientific notation workaround).You can use XML format files. While they basically contain the same formatting information, they cause BCP to behave in a different way. XML format files are more verbose than legacy format files, but also easier to understand the processing of data by bulk operations using them. Note that you can generate xml (and non-xml) format files using bcp utility.
This is the xml format file bcp.xml:
<?xml version="1.0"?>
<BCPFORMAT
xmlns
="
http://schemas.microsoft.com/sqlserver/2004/bulkload/format
"
xmlns:xsi
="
http://www.w3.org/2001/XMLSchema-instance
">
<RECORD>
<FIELD
ID
="
1
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
12
"/>
<FIELD
ID
="
2
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
11
"
COLLATION
="
SQL_Latin1_General_CP1_CI_AS
"/>
<FIELD
ID
="
3
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
11
"
COLLATION
="
SQL_Latin1_General_CP1_CI_AS
"/>
<FIELD
ID
="
4
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
3
"
COLLATION
="
SQL_Latin1_General_CP1_CI_AS
"/>
<FIELD
ID
="
5
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
41
"/>
<FIELD
ID
="
6
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
41
"/>
<FIELD
ID
="
7
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
41
"/>
<FIELD
ID
="
8
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\t
"
MAX_LENGTH
="
12
"
COLLATION
="
SQL_Latin1_General_CP1_CI_AS
"/>
<FIELD
ID
="
9
"
xsi:type
="
CharTerm
"
TERMINATOR
="
\r\n
"
MAX_LENGTH
="
41
"/>
</RECORD>
<ROW>
<COLUMN
SOURCE
="
1
"
NAME
="
RowID
"
xsi:type
="
SQLINT
"/>
<COLUMN
SOURCE
="
2
"
NAME
="
IndexID
"
xsi:type
="
SQLVARYCHAR
"/>
<COLUMN
SOURCE
="
3
"
NAME
="
ComponentId
"
xsi:type
="
SQLVARYCHAR
"/>
<COLUMN
SOURCE
="
4
"
NAME
="
ComponentCurrency
"
xsi:type
="
SQLCHAR
"/>
<COLUMN
SOURCE
="
5
"
NAME
="
NumberOfItems
"
xsi:type
="
SQLDECIMAL
"
PRECISION
="
18
"
SCALE
="
6
"/>
<COLUMN
SOURCE
="
6
"
NAME
="
ComponentOrigprice
"
xsi:type
="
SQLDECIMAL
"
PRECISION
="
18
"
SCALE
="
6
"/>
<COLUMN
SOURCE
="
7
"
NAME
="
ComponentWeight
"
xsi:type
="
SQLFLT8
"/>
<COLUMN
SOURCE
="
8
"
NAME
="
IndexType
"
xsi:type
="
SQLVARYCHAR
"/>
<COLUMN
SOURCE
="
9
"
NAME
="
IndexDivisor
"
xsi:type
="
SQLDECIMAL
"
PRECISION
="
18
"
SCALE
="
6
"/>
</ROW>
</BCPFORMAT>
As you see, other decimal(18, 6) columns are processed correctly unless they contain scientific notation data.
The following query succeeds:
bulk
insert
dbo.IndexComponentsTypeChecked
from
'c:\Projects\BulkTest\testfeednegative.txt'
with
(formatfile
=
'c:\Projects\BulkTest\bcp.xml'
)
I wondered, what happens when I change the destination type to float? I altered table IndexComponentsTypeChecked using the following query:
alter
table
[IndexComponentsTypeChecked]
alter
column
ComponentWeight
float
null
GO
bulk
insert
dbo.IndexComponentsTypeChecked
from
'c:\Projects\BulkTest\testfeednegative.txt'
with
(formatfile
=
'c:\Projects\BulkTest\bcpc1.fmt'
)
This time, even negative value was processed correctly!
So, it looks like to properly process numeric columns with potentially negative scientific notation values you have to use XML format files if you have to use character data files. If you can use native data files the problem doesn't exists, as all numbers are stored as binary values in the data file. I created data file datanative.txt and format file bcpnative.fmt so you can investigate their structure.
There are still organizations that have large systems based on SQL Server 2000. Usually such systems process large number of feeds, many of them in character format. The change of behavior of bcp and especially bug with negative values can add a lot of work during migration of the systems. Perhaps Microsoft should think about making CONVERT understand string scientific notation values, this might save major headaches for many developers and project managers.