A while ago, Tony Rogerson showed a way how to pass a list of integers (csv) to stored procedure. Approach of creating a script and executing it is OK for smaller amounts of data. I thought, that maybe, as xml is a form of text after all, it would be more appropriate? I crafted a stored proc based on Tony's code that instead of generating a script, generates an xml stream that is subsequently used in query inserting rows. This method is also resistant to sql injection attemtpts so such stored proc is not a 'bobby tables' one :)
Here's the code of the procedure:
create
table tStoredInts (afield int)
go
create
proc csv_to_table2
@csv
varchar(max)
AS
BEGIN
/***
Insert numbers from a CSV to tStoredInts table
***/
declare @handle int --xml document handle
SET @csv = ltrim(rtrim(@csv))
IF RIGHT( @csv, 1 ) = ',' -- If last character is a comma remove it it
SET @csv = left(@csv, len(@csv) -1)
DECLARE @xml varchar(max)
SET @xml = REPLACE( @csv, ',', '</o><o>')
SET @xml = '<numbers><o>' + @xml + '</o></numbers>'
exec sp_xml_preparedocument @handle output, @xml
--insert elements from xml
insert tStoredInts (afield)
select field
from openxml(@handle, '/numbers/o', 2) with (field int '.')
exec sp_xml_removedocument @handle
end
--proc
So, I just came from seminar led by Kalen Delaney, guru of SQL Server (any version), organized by Ireland SQL Technology User Group. The whole event was sponsored and hosted by Microsoft. I must say, I am impressed. Kalen was able to sort out and put into proper drawers all bits of knowledge I had before and stuff a whole lotta more into my head. She talked about concurrency, transaction isolation levels, differences between optimistic and pessimistic locking, blocking and diagnostic tools, and at the end about gotchas related to (nolock) hint, overused so eagerly by many devs (I confess, I used this hint myself ;))
You can find tons of good info about execution plans and isolation levels at Craig Friedman's blog here.
Overall, it was a very interesting day!
So, you might think that sorting within ASCII range is predictable and order defined by ASCII table is finite? You are on safe side when you do not use unicode nor 'fancy' letters? Well.. you are wrong.
Recently I have been working on comparison of data stored in Sybase and SQL. Database structure is the same, I have script pulling data from Sybase to SQL Server and when this script is finished, data are meant to be identical of course. So I was thrown when we found out that rows can be returned in some cases in different order on both databases. That is, when a varchar field contained two underscore characters in a row, rows sorted by this field could be returned in different order.
After some thinking and looking for information I was able to create script that can reproduce this behavior on SQL Server.
First, create table and fill it with some data:
create table testtable
(
afield varchar(25)
)
go
insert testtable (afield)
select 'A_'
union all
select 'A__'
union all
select 'A_B'
union all
select 'A__B'
Now select rows ordering by afield:
select afield from testtable order by afield
The results are pretty 'reasonable':
afield
-------------------------
A_
A__
A__B
A_B
Ok, but Sybase returned different order:
afield
-------------------------
A_
A_B
A__
A__B
The mystery is hidden within collation of the field. Character fields derive default collation from database setting, which in turn gets collation if it is not specified explicitly from server settings.
If you run following query, you will get results identical with the previous result:
alter table testtable alter column afield varchar(25) collate Latin1_General_Bin
This query applies explicit collation fo afield, overriding in this way default collation. You can check collation of afield using this query:
select column_id, name, collation_name from sys.columns where object_id = object_id('testtable')
column_id name collation_name
----------- ----------- -------------------
1 afield Latin1_General_BIN
This was the collation after change, compliant with Sybase collation in my case.
Default collation on my SQL Server is
column_id name collation_name
----------- --------- ------------------------------
1 afield SQL_Latin1_General_CP1_CI_AS
As you see, collation setting may affect sort order of fields which do not contain unicode characters. When I run the same query on both database engines, I get mostly same results with exception of values containing multiple '_' characters. The solution is simple: specify collation of character fields to collation compatible with settings of the other database server.
Note that modifying default collation of database after tables are created does not affect collation of these fields. You can't also change collation in a column if this column is part of an index.