Piotr Rodak

if datepart(dw, getdate()) in (6, 7)
use pubs;

Last week I struggled with issue that had been causing many problems in our work. We have implemented SSIS packages that synchronize Sybase and SQL Server databases. Some of tables contain text (on Sybase) columns that have to be passed to varchar(max) columns on SQL Server. It wasn't apparent on the beginning, that only the first 32k of data are passed though.

I looked for an answer in many places. I tried to see if moving blobs via temporary files will help. Nope. Max size of such file was also 32k. Then I looked at script components, trying to find out if there is possibly an error truncating these data. There is a function GetBlobData that reads data from source. Max length of data returned by this function was 32k also. I thought, maybe it is max size of internal buffer? This function takes three parameters: column index, starting index of data to read and data length. I tried to implement moving window, and call this method in a loop, till all data are read, but no success. It was reading only 32k and not a single byte more. I knew the actual length of data because I read it in source query on Sybase side.

Looking for information, I came across textsize variable. Well, this seemed to be promising. But how to apply change to it in SSIS? If I put SET TEXTSIZE 100000 in front of the feeding select in data source, even though there was no syntax error, data source could no longer retrieve column information and the whole data flow couldn't work. I tried to create Execute SQL task before Data Flow task, but it didn't help. It seemed that what works when you connect using any sort of console, doesn't work if OLE DB is used.  I reached to my mossy memory banks - I remembered that about 8 years ago, when I worked with COM+, I read about various parameters that OLEDB can accept, depending on the driver used. So, I found that indeed you can modify textsize in OLE DB parameters. The last issue was how to apply it to connection manager in SSIS? The edit dialog doesn't contain place to specify this value. Extended properies did not work. If you append TextSize=10000000; to connection string in properties of connection manager, it forgets the password. If you open the Edit window, the connection string is generated from scratch and you loose the setting. Catch 22.

We have a solution that reads connection strings from external windows config file. I added the parameter to the connection string in this file and verified, that this is really the right way - BLOB fields where passed properly. But in design time, though it wasn't that important, I wanted to find a way how to pass this parameter and maintain usable connection manager. I tried to imagine how such edit window of connection manager may work. When you press OK, all properties you set are potentially verified and connection string is created. I hoped that the verification is not too strong, because I decided to attach TextSize parameter to one of properties in the dialog. I chose server name :). Guess what - there is no verification whatsoever of the server name on that dialog. So, my server name is now like

sybase_server,5335;TextSize=100000;

It works. It turned out that the solution is easy (as ususal), just pity that I spent so much time trying to nail it down.

  

Boss told you to come up with a question to test new programmers? Here's one: you have a linked server and an openquery statement. What is the number returned by this query?

 select count(*) from openquery(LINKED_SQL, 'select count(*) from database.dbo.table1')

Posted by Piotr Rodak | with no comments

Just this week I had opportunity to change collation of all objects using it in a database without dropping it. I like to computers to do what computers should do - work that is :). So I created a query that gave me script changing collation of all columns using collation - varchar, char and so on.

I have got 3900 lines altering tables and columns. Nice, but not so easy, this script was not working at first F5 - you can't  change collation of a column that is a part of constraint or index - you have do drop them. So I had to wade through the script and add statements dropping constraints and indexes and recreating them after modifications had been done. To find tables that were causing problems I simply executed script - execution was stopped on column whose collation could not be changed and I had exact information what should be done.

This is the query that generates alter table .. alter column statements:

select 'alter table [' + s.name + '].[' + t.name + '] alter column [' + c.name + '] ' +
                ty.name + case when ty.name not in ('text', 'sysname') then '(' + case when c.max_length > 0 then
                case when ty.name not in ('nchar', 'nvarchar') then convert(varchar, c.max_length) else convert(varchar, c.max_length/2) end else 'max' end +

                ')' else '' end + ' collate SQL_Latin1_General_CP1_CI_AS ' +
                case when c.is_nullable = 0 then 'NOT ' else '' end + 'NULL'
from (sys.columns c inner join sys.types ty on c.system_type_id = ty.system_type_id) inner join
                (sys.objects t inner join sys.schemas s on t.schema_id = s.schema_id) on c.object_id = t.object_id
where t.type='U' and c.collation_name is not null and ty.name <> 'sysname'
order by s.name, t.name, c.column_id

 I saved loads of time thanks to this script.

A while ago, Tony Rogerson showed a way how to pass a list of integers (csv) to stored procedure. Approach of creating a script and executing it is OK for smaller amounts of data. I thought, that maybe, as xml is a form of text after all, it would be more appropriate? I crafted a stored proc based on Tony's code that instead of generating a script, generates an xml stream that is subsequently used in query inserting rows. This method is also resistant to sql injection attemtpts so such stored proc is not a 'bobby tables' one :)

Here's the code of the procedure:

create table tStoredInts (afield int)

go

create proc csv_to_table2

@csv varchar(max)

AS

BEGIN

/***

Insert numbers from a CSV to tStoredInts table

***/

declare @handle int --xml document handle

SET @csv = ltrim(rtrim(@csv))

IF RIGHT( @csv, 1 ) = ',' -- If last character is a comma remove it it

SET @csv = left(@csv, len(@csv) -1)

DECLARE @xml varchar(max)

SET @xml = REPLACE( @csv, ',', '</o><o>')

SET @xml = '<numbers><o>' + @xml + '</o></numbers>'

 

exec sp_xml_preparedocument @handle output, @xml

--insert elements from xml

insert tStoredInts (afield)

select field

from openxml(@handle, '/numbers/o', 2) with (field int '.')

exec sp_xml_removedocument @handle

end --proc

Posted by Piotr Rodak | 1 comment(s)
Filed under: ,

So, I just came from seminar led by Kalen Delaney, guru of SQL Server (any version), organized by Ireland SQL Technology User Group. The whole event was sponsored and hosted by Microsoft. I must say, I am impressed. Kalen was able to sort out and put into proper drawers all bits of knowledge I had before and stuff a whole lotta more into my head. She talked about concurrency, transaction isolation levels, differences between optimistic and pessimistic locking, blocking and diagnostic tools, and at the end about gotchas related to (nolock) hint, overused so eagerly by many devs (I confess, I used this hint myself ;))

You can find tons of good info about execution plans and isolation levels at Craig Friedman's blog here

Overall, it was a very interesting day!


 

Posted by Piotr Rodak | with no comments
Filed under:

So, you might think that sorting within ASCII range is predictable and order defined by ASCII table is finite? You are on safe side when you do not use unicode nor 'fancy' letters? Well.. you are wrong.

Recently I have been working on comparison of data stored in Sybase and SQL. Database structure is the same, I have script pulling data from Sybase to SQL Server and when this script is finished, data are meant to be identical of course. So I was thrown when we found out that rows can be returned in some cases in different order on both databases. That is, when a varchar field contained two underscore characters in a row, rows sorted by this field could be returned in different order.

After some thinking and looking for information I was able to create script that can reproduce this behavior on SQL Server.

First, create table and fill it with some data:

create table testtable

(

afield varchar(25)

)

go

insert testtable (afield)

select 'A_'

union all

select 'A__'

union all

select 'A_B'

union all

select 'A__B'

 Now select rows ordering by afield:

select afield from testtable order by afield

The results are pretty 'reasonable':

afield

-------------------------

A_

A__

A__B

A_B

 Ok, but Sybase returned different order:

afield

-------------------------

A_

A_B

A__

A__B

The mystery is hidden within collation of the field. Character fields derive default collation from database setting, which in turn gets collation if it is not specified explicitly from server settings.

If you run following query, you will get results identical with the previous result:

alter table testtable alter column afield varchar(25) collate Latin1_General_Bin

This query applies explicit collation fo afield, overriding in this way default collation. You can check collation of afield using this query:

select column_id, name, collation_name from sys.columns where object_id = object_id('testtable')

column_id   name        collation_name

----------- ----------- -------------------

1           afield      Latin1_General_BIN

This was the collation after change, compliant with Sybase collation in my case.

Default collation on my SQL Server is

column_id   name      collation_name

----------- --------- ------------------------------

1           afield    SQL_Latin1_General_CP1_CI_AS

As you see, collation setting may affect sort order of fields which do not contain unicode characters. When I run the same query on both database engines, I get mostly same results with exception of values containing multiple '_' characters. The solution is simple: specify collation of character fields to collation compatible with settings of the other database server.

Note that modifying default collation of database after tables are created does not affect collation of these fields. You can't also change collation in a column if this column is part of an index.

 

Posted by Piotr Rodak | with no comments

Last week I came across an interesting issue.

If you have a FOR INSERT trigger on a table and you want to store some information on a linked server, the transaction the trigger is running within is automatically expanded to distributed mode. This creates some problems. First, you have to have Distributed Transaction Coordinator running on your machine. Second, even this does not mean you will succeed, as not all providers support distributed transactions protocols. We will see this further in this post.

First, create a linked server that will point to an Excel workbook on local disk, for simplicity sake.

DECLARE @RC int

DECLARE @server nvarchar(128)

DECLARE @srvproduct nvarchar(128)

DECLARE @provider nvarchar(128)

DECLARE @datasrc nvarchar(4000)

DECLARE @location nvarchar(4000)

DECLARE @provstr nvarchar(4000)

DECLARE @catalog nvarchar(128)

-- Set parameter values

SET @server = 'XLTEST'

SET @srvproduct = 'Excel'

SET @provider = 'Microsoft.Jet.OLEDB.4.0'

SET @datasrc = 'c:\LSBook.xls'

SET @provstr = 'Excel 8.0'

EXEC @RC = [master].[dbo].[sp_addlinkedserver] @server, @srvproduct, @provider,

@datasrc, @location, @provstr, @catalog

Create LSBook.xls file on your drive first. In first row, in first two columns write afield and bfield:

Now, create a test table and trigger:

use testdb

go

 

if object_id('dbo.testtable') is not null

      drop table dbo.testtable

go

create table dbo.testtable

(

      afield int identity(1, 1),

      bfield varchar(20) not null

)

go

create trigger trg_testtrg1 on dbo.testtable

for insert

as

begin

 

insert xltest...sheet1$ (afield, bfield)

select afield, bfield

from inserted

 

print 'inserted ' + convert(varchar, @@rowcount) + ' row(s)'

 

end

Finally, execute some statements to see what happens:

insert into dbo.testtable values('somevealue1')

insert into dbo.testtable values('somevealue2')

go

select * from dbo.testtable

And the result is..

Msg 8501, Level 16, State 3, Procedure trg_testtrg1, Line 6

MSDTC on server 'VAMILOXP' is unavailable.

afield      bfield

----------- --------------------

 

(0 row(s) affected)

Oops, MSDTC is not running on the machine.. Start it. Note that you may have to install it if you are running Windows 2003 Server.

Msg 7390, Level 16, State 2, Procedure trg_testtrg1, Line 6

The requested operation could not be performed because OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "xltest" does not support the required transaction interface.

afield      bfield

----------- --------------------

 

(0 row(s) affected)

 

So, as you see, MSDTC did not help us too much. The provider I used in this case does not support distributed transactions. What we can do to be able to insert information to linked server data store?

There is an article in MSDN that mentions that you can COMMIT transaction in trigger before you call linked server. However this causes a batch terminating error. Let's modify trigger code to commit transaction and see the results:

create trigger trg_testtrg1 on dbo.testtable

for insert

as

begin

--commit transaction

commit

--call linked server

insert xltest...sheet1$ (afield, bfield)

select afield, bfield

from inserted

Will the code above work? No! Commiting transaction makes inserted table empty, so we will not have anything to store! Here's modification what overcomes this little issue:

create trigger trg_testtrg1 on dbo.testtable

for insert

as

begin

--declare temp variable and store rows to insert to linked server table

declare @tmp table (a int, b varchar(20))

insert @tmp (a, b) select afield, bfield from inserted

 

--commit current tran to avoid escalation

commit

 

--this will work, but the batch will be broken.

insert xltest...sheet1$ (afield, bfield)

select a, b

from @tmp

end

Let's see results of this code:

insert into dbo.testtable values('somevealue1')

insert into dbo.testtable values('somevealue2')

go

--there is only one record in the table

select * from dbo.testtable

 

 

(1 row(s) affected)

 

(1 row(s) affected)

Msg 3609, Level 16, State 1, Line 1

The transaction ended in the trigger. The batch has been aborted.

afield      bfield

----------- --------------------

1           somevealue1

 

(1 row(s) affected)

As you see there is only one record in the table - the batch was broken before second insert in the code. Obviously, this is not what we want.

I don't actually understand, why Microsoft decided to throw batch breaking error when a commit in trigger is called. Perhaps to prevent poor coding practices? Anyway, this error is not thrown when you add begin tran statement AFTER your linked server query:

create trigger trg_testtrg1 on dbo.testtable

for insert

as

begin

--declare temp variable and store rows to insert to linked server table

declare @tmp table (a int, b varchar(20))

insert @tmp (a, b) select afield, bfield from inserted

 

--commit current tran to avoid escalation

commit

 

--this will work, but the batch will be broken.

insert xltest...sheet1$ (afield, bfield)

select a, b

from @tmp

 

begin tran

end

So far, so good. In some cases this will work.

afield      bfield

----------- --------------------

1           somevealue1

2           somevealue2

But how about this code? What if you decide to roll back some of the work done due to an error?

begin tran

insert into dbo.testtable values('somevealue1')

insert into dbo.testtable values('somevealue2')

rollback

Well.. we are in trouble now:

select * from dbo.testtable

afield      bfield

----------- --------------------

1           somevealue1

2           somevealue2

 

(2 row(s) affected)

 

This transaction is not rolled back, as it has been already committed! In some cases, such behavior is acceptable, for example when an application calls single insert statement. But if insert query is a part of bigger batch, this approach is definitely not an option.

The easiest way I found is to use SQL Agent to do remote server job out of transaction boundaries. First, create a queue table that will store temporarily rows to be inserted to linked server storage:

 

use testdb

go

--this table stores records to be inserted into linked server table

create table dbo.linkedsrvqueue

(

      afield int not null,

      bfield varchar(20) not null

)

go

 

Then, create a simple job that will execute command moving rows to linked server:

USE [msdb]

GO

/****** Object:  Job [SaveToLinkedServer]    Script Date: 10/29/2007 23:01:19 ******/

BEGIN TRANSACTION

DECLARE @ReturnCode INT

SELECT @ReturnCode = 0

/****** Object:  JobCategory [[Uncategorized (Local)]]]    Script Date: 10/29/2007 23:01:20 ******/

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)

BEGIN

EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

 

END

 

DECLARE @jobId BINARY(16)

EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'SaveToLinkedServer',

            @enabled=1,

            @notify_level_eventlog=0,

            @notify_level_email=0,

            @notify_level_netsend=0,

            @notify_level_page=0,

            @delete_level=0,

            @description=N'No description available.',

            @category_name=N'[Uncategorized (Local)]',

            @owner_login_name=N'VAMILOXP\rogas', @job_id = @jobId OUTPUT

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

/****** Object:  Step [StoreToLinkedServerStep]    Script Date: 10/29/2007 23:01:21 ******/

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'StoreToLinkedServerStep',

            @step_id=1,

            @cmdexec_success_code=0,

            @on_success_action=1,

            @on_success_step_id=0,

            @on_fail_action=2,

            @on_fail_step_id=0,

            @retry_attempts=0,

            @retry_interval=0,

            @os_run_priority=0, @subsystem=N'TSQL',

            @command=N'insert xltest...sheet1$ (afield, bfield)

select afield, bfield

from dbo.linkedsrvqueue

if @@error = 0

      delete from dbo.linkedsrvqueue',

            @database_name=N'testdb',

            @flags=0

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'

IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

COMMIT TRANSACTION

GOTO EndSave

QuitWithRollback:

      IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION

EndSave:

 

Finally, modify the trigger to call store rows in queue table and call the job:

create trigger trg_testtrg1 on dbo.testtable

for insert

as

begin

--insert to queue table

insert dbo.linkedsrvqueue(afield, bfield)

select afield, bfield

from inserted

 

print 'inserted ' + convert(varchar, @@rowcount) + ' row(s)'

exec msdb..sp_start_job @job_name='SaveToLinkedServer'

 

end

 

This way the linked server functionality is called outside of transaction boundaries, which does not upset DTC and the main transaction flow remains consistent. The issues with this approach are: you have to have proper credentials to call sp_start_job and if the insert statement is executed too often, you may get following error:

Msg 22022, Level 16, State 1, Line 0

SQLServerAgent Error: Request to run job SaveToLinkedServer (from User VAMILOXP\rogas) refused because the job already has a pending request from User VAMILOXP\rogas.

 

You may get rid of this error by checking job status before calling sp_start_job, you may also configure this job to run periodically and remove the sp_start_job call from the trigger at all.

 

References:

http://msdn2.microsoft.com/en-us/library/ms187844.aspx

 

My first blog post. As someone said somewhere, sharing information forces you to look at it from every side, ensure that it is really correct. I will do my best to make so.

This blog is going to be about... SQL Server :). I am database developer and from time to time I come across quite interesting problems, up with (brilliant Wink) ideas, have some observations that I would like to share. Some of them will not be revolutionary but I hope that some of them will allow you to look at questions you have from different angle, which is always helpful, I noticed.

My main areas of interest are performance, programming, some SSIS, some c# will perhaps find their way here as well. We'll see.