Last week I struggled with issue that had been causing many problems in our work. We have implemented SSIS packages that synchronize Sybase and SQL Server databases. Some of tables contain text (on Sybase) columns that have to be passed to varchar(max) columns on SQL Server. It wasn't apparent on the beginning, that only the first 32k of data are passed though.
I looked for an answer in many places. I tried to see if moving blobs via temporary files will help. Nope. Max size of such file was also 32k. Then I looked at script components, trying to find out if there is possibly an error truncating these data. There is a function GetBlobData that reads data from source. Max length of data returned by this function was 32k also. I thought, maybe it is max size of internal buffer? This function takes three parameters: column index, starting index of data to read and data length. I tried to implement moving window, and call this method in a loop, till all data are read, but no success. It was reading only 32k and not a single byte more. I knew the actual length of data because I read it in source query on Sybase side.
Looking for information, I came across textsize variable. Well, this seemed to be promising. But how to apply change to it in SSIS? If I put SET TEXTSIZE 100000 in front of the feeding select in data source, even though there was no syntax error, data source could no longer retrieve column information and the whole data flow couldn't work. I tried to create Execute SQL task before Data Flow task, but it didn't help. It seemed that what works when you connect using any sort of console, doesn't work if OLE DB is used. I reached to my mossy memory banks - I remembered that about 8 years ago, when I worked with COM+, I read about various parameters that OLEDB can accept, depending on the driver used. So, I found that indeed you can modify textsize in OLE DB parameters. The last issue was how to apply it to connection manager in SSIS? The edit dialog doesn't contain place to specify this value. Extended properies did not work. If you append TextSize=10000000; to connection string in properties of connection manager, it forgets the password. If you open the Edit window, the connection string is generated from scratch and you loose the setting. Catch 22.
We have a solution that reads connection strings from external windows config file. I added the parameter to the connection string in this file and verified, that this is really the right way - BLOB fields where passed properly. But in design time, though it wasn't that important, I wanted to find a way how to pass this parameter and maintain usable connection manager. I tried to imagine how such edit window of connection manager may work. When you press OK, all properties you set are potentially verified and connection string is created. I hoped that the verification is not too strong, because I decided to attach TextSize parameter to one of properties in the dialog. I chose server name :). Guess what - there is no verification whatsoever of the server name on that dialog. So, my server name is now like
sybase_server,5335;TextSize=100000;
It works. It turned out that the solution is easy (as ususal), just pity that I spent so much time trying to nail it down.
So, you might think that sorting within ASCII range is predictable and order defined by ASCII table is finite? You are on safe side when you do not use unicode nor 'fancy' letters? Well.. you are wrong.
Recently I have been working on comparison of data stored in Sybase and SQL. Database structure is the same, I have script pulling data from Sybase to SQL Server and when this script is finished, data are meant to be identical of course. So I was thrown when we found out that rows can be returned in some cases in different order on both databases. That is, when a varchar field contained two underscore characters in a row, rows sorted by this field could be returned in different order.
After some thinking and looking for information I was able to create script that can reproduce this behavior on SQL Server.
First, create table and fill it with some data:
create table testtable
(
afield varchar(25)
)
go
insert testtable (afield)
select 'A_'
union all
select 'A__'
union all
select 'A_B'
union all
select 'A__B'
Now select rows ordering by afield:
select afield from testtable order by afield
The results are pretty 'reasonable':
afield
-------------------------
A_
A__
A__B
A_B
Ok, but Sybase returned different order:
afield
-------------------------
A_
A_B
A__
A__B
The mystery is hidden within collation of the field. Character fields derive default collation from database setting, which in turn gets collation if it is not specified explicitly from server settings.
If you run following query, you will get results identical with the previous result:
alter table testtable alter column afield varchar(25) collate Latin1_General_Bin
This query applies explicit collation fo afield, overriding in this way default collation. You can check collation of afield using this query:
select column_id, name, collation_name from sys.columns where object_id = object_id('testtable')
column_id name collation_name
----------- ----------- -------------------
1 afield Latin1_General_BIN
This was the collation after change, compliant with Sybase collation in my case.
Default collation on my SQL Server is
column_id name collation_name
----------- --------- ------------------------------
1 afield SQL_Latin1_General_CP1_CI_AS
As you see, collation setting may affect sort order of fields which do not contain unicode characters. When I run the same query on both database engines, I get mostly same results with exception of values containing multiple '_' characters. The solution is simple: specify collation of character fields to collation compatible with settings of the other database server.
Note that modifying default collation of database after tables are created does not affect collation of these fields. You can't also change collation in a column if this column is part of an index.
Last week I came across an interesting issue.
If you have a FOR INSERT trigger on a table and you want to store some information on a linked server, the transaction the trigger is running within is automatically expanded to distributed mode. This creates some problems. First, you have to have Distributed Transaction Coordinator running on your machine. Second, even this does not mean you will succeed, as not all providers support distributed transactions protocols. We will see this further in this post.
First, create a linked server that will point to an Excel workbook on local disk, for simplicity sake.
DECLARE @RC int
DECLARE @server nvarchar(128)
DECLARE @srvproduct nvarchar(128)
DECLARE @provider nvarchar(128)
DECLARE @datasrc nvarchar(4000)
DECLARE @location nvarchar(4000)
DECLARE @provstr nvarchar(4000)
DECLARE @catalog nvarchar(128)
-- Set parameter values
SET @server = 'XLTEST'
SET @srvproduct = 'Excel'
SET @provider = 'Microsoft.Jet.OLEDB.4.0'
SET @datasrc = 'c:\LSBook.xls'
SET @provstr = 'Excel 8.0'
EXEC @RC = [master].[dbo].[sp_addlinkedserver] @server, @srvproduct, @provider,
@datasrc, @location, @provstr, @catalog
Create LSBook.xls file on your drive first. In first row, in first two columns write afield and bfield:

Now, create a test table and trigger:
use testdb
go
if object_id('dbo.testtable') is not null
drop table dbo.testtable
go
create table dbo.testtable
(
afield int identity(1, 1),
bfield varchar(20) not null
)
go
create trigger trg_testtrg1 on dbo.testtable
for insert
as
begin
insert xltest...sheet1$ (afield, bfield)
select afield, bfield
from inserted
print 'inserted ' + convert(varchar, @@rowcount) + ' row(s)'
end
Finally, execute some statements to see what happens:
insert into dbo.testtable values('somevealue1')
insert into dbo.testtable values('somevealue2')
go
select * from dbo.testtable
And the result is..
Msg 8501, Level 16, State 3, Procedure trg_testtrg1, Line 6
MSDTC on server 'VAMILOXP' is unavailable.
afield bfield
----------- --------------------
(0 row(s) affected)
Oops, MSDTC is not running on the machine.. Start it. Note that you may have to install it if you are running Windows 2003 Server.
Msg 7390, Level 16, State 2, Procedure trg_testtrg1, Line 6
The requested operation could not be performed because OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "xltest" does not support the required transaction interface.
afield bfield
----------- --------------------
(0 row(s) affected)
So, as you see, MSDTC did not help us too much. The provider I used in this case does not support distributed transactions. What we can do to be able to insert information to linked server data store?
There is an article in MSDN that mentions that you can COMMIT transaction in trigger before you call linked server. However this causes a batch terminating error. Let's modify trigger code to commit transaction and see the results:
create trigger trg_testtrg1 on dbo.testtable
for insert
as
begin
--commit transaction
commit
--call linked server
insert xltest...sheet1$ (afield, bfield)
select afield, bfield
from inserted
Will the code above work? No! Commiting transaction makes inserted table empty, so we will not have anything to store! Here's modification what overcomes this little issue:
create trigger trg_testtrg1 on dbo.testtable
for insert
as
begin
--declare temp variable and store rows to insert to linked server table
declare @tmp table (a int, b varchar(20))
insert @tmp (a, b) select afield, bfield from inserted
--commit current tran to avoid escalation
commit
--this will work, but the batch will be broken.
insert xltest...sheet1$ (afield, bfield)
select a, b
from @tmp
end
Let's see results of this code:
insert into dbo.testtable values('somevealue1')
insert into dbo.testtable values('somevealue2')
go
--there is only one record in the table
select * from dbo.testtable
(1 row(s) affected)
(1 row(s) affected)
Msg 3609, Level 16, State 1, Line 1
The transaction ended in the trigger. The batch has been aborted.
afield bfield
----------- --------------------
1 somevealue1
(1 row(s) affected)
As you see there is only one record in the table - the batch was broken before second insert in the code. Obviously, this is not what we want.
I don't actually understand, why Microsoft decided to throw batch breaking error when a commit in trigger is called. Perhaps to prevent poor coding practices? Anyway, this error is not thrown when you add begin tran statement AFTER your linked server query:
create trigger trg_testtrg1 on dbo.testtable
for insert
as
begin
--declare temp variable and store rows to insert to linked server table
declare @tmp table (a int, b varchar(20))
insert @tmp (a, b) select afield, bfield from inserted
--commit current tran to avoid escalation
commit
--this will work, but the batch will be broken.
insert xltest...sheet1$ (afield, bfield)
select a, b
from @tmp
begin tran
end
So far, so good. In some cases this will work.
afield bfield
----------- --------------------
1 somevealue1
2 somevealue2
But how about this code? What if you decide to roll back some of the work done due to an error?
begin tran
insert into dbo.testtable values('somevealue1')
insert into dbo.testtable values('somevealue2')
rollback
Well.. we are in trouble now:
select * from dbo.testtable
afield bfield
----------- --------------------
1 somevealue1
2 somevealue2
(2 row(s) affected)
This transaction is not rolled back, as it has been already committed! In some cases, such behavior is acceptable, for example when an application calls single insert statement. But if insert query is a part of bigger batch, this approach is definitely not an option.
The easiest way I found is to use SQL Agent to do remote server job out of transaction boundaries. First, create a queue table that will store temporarily rows to be inserted to linked server storage:
use testdb
go
--this table stores records to be inserted into linked server table
create table dbo.linkedsrvqueue
(
afield int not null,
bfield varchar(20) not null
)
go
Then, create a simple job that will execute command moving rows to linked server:
USE [msdb]
GO
/****** Object: Job [SaveToLinkedServer] Script Date: 10/29/2007 23:01:19 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
/****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 10/29/2007 23:01:20 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'SaveToLinkedServer',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'No description available.',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=N'VAMILOXP\rogas', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [StoreToLinkedServerStep] Script Date: 10/29/2007 23:01:21 ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'StoreToLinkedServerStep',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'insert xltest...sheet1$ (afield, bfield)
select afield, bfield
from dbo.linkedsrvqueue
if @@error = 0
delete from dbo.linkedsrvqueue',
@database_name=N'testdb',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
Finally, modify the trigger to call store rows in queue table and call the job:
create trigger trg_testtrg1 on dbo.testtable
for insert
as
begin
--insert to queue table
insert dbo.linkedsrvqueue(afield, bfield)
select afield, bfield
from inserted
print 'inserted ' + convert(varchar, @@rowcount) + ' row(s)'
exec msdb..sp_start_job @job_name='SaveToLinkedServer'
end
This way the linked server functionality is called outside of transaction boundaries, which does not upset DTC and the main transaction flow remains consistent. The issues with this approach are: you have to have proper credentials to call sp_start_job and if the insert statement is executed too often, you may get following error:
Msg 22022, Level 16, State 1, Line 0
SQLServerAgent Error: Request to run job SaveToLinkedServer (from User VAMILOXP\rogas) refused because the job already has a pending request from User VAMILOXP\rogas.
You may get rid of this error by checking job status before calling sp_start_job, you may also configure this job to run periodically and remove the sp_start_job call from the trigger at all.
References:
http://msdn2.microsoft.com/en-us/library/ms187844.aspx