May 2007 - Posts

Data type conversions can often present problems in a database, specifically when converting from character to numerical data. I’m not talking about precision or rounding problems, not this post anyway, I’m talking about situations where your queries work fine and then one day without change to your query it just stops working with an error like this one...

Msg 245, Level 16, State 1, Line 1

Conversion failed when converting the varchar value 'T' to data type int.

What is going on? How can a query work fine and without any changes it suddenly stop working and give an error? Well, query plans, essentially your query plan is built based on statistics such as the amount of data in the table, cardinality of columns etc... As your data grows your query plans may change. Let’s see the behaviour I describe.

Firstly, our example setup; this creates a small table and populates it with 52 rows. The idea here is that the last character on the column named acharcolumn is actually suppose to be a number, in a properly designed database you would have probably a) broken it out into its own column or b) have some form of check expression to make sure the character is numeric.

create table #datatype_example (

    cat         int        not null,

    acharcolumn varchar(5) not null,

    subcat      int        not null

)

 

set nocount on

 

declare @i int

set @i = 1

 

--  Bad data one...

insert #datatype_example ( cat, acharcolumn, subcat )    values( 0, '10cT', 1 )

 

while @i <= 50

begin

    insert #datatype_example ( cat, acharcolumn, subcat )    values( @i, '20c2', @i % 5 )

    set @i = @i + 1

 

end

go

 

create index nc1 on #datatype_example( cat, acharcolumn )

go

 

Now we have our table set up, let’s try our query; executing the query below yields one row.

 

select acharcolumn

from #datatype_example

where cat <= 1000

  and right( acharcolumn, 1 ) = 1

  and subcat = 3

go

 

Now, our table grows and let’s see what happens.

 

declare @i int

set @i = 51

 

while @i <= 50000

begin

    insert #datatype_example ( cat, acharcolumn, subcat )    values( @i, '20c2', @i % 5 )

    set @i = @i + 1

 

end

go

 

Run exactly the same query as before, this time we get an error!

 

select acharcolumn

from #datatype_example

where cat <= 1000

  and right( acharcolumn, 1 ) = 1

  and subcat = 3

go

 

Msg 245, Level 16, State 1, Line 1

Conversion failed when converting the varchar value 'T' to data type int.

We can ‘fix’ the problem by creating an index, but how long until we get the error again?

 

create index nc2 on #datatype_example ( subcat, cat, acharcolumn )

go

 

select acharcolumn

from #datatype_example

where cat <= 1000

  and right( acharcolumn, 1 ) = 1

  and subcat = 3

 

go

 

drop table #datatype_example


Ok, so what’s going on, what’s the recommendation on this?

It’s all down to access paths and which expressions in the WHERE clause are evaluated first; the WHERE clause is suppose to evaluate all at once which is why the position of expressions on the WHERE has no effect, also using brackets has no effect on the access paths chosen – using brackets does have effect on evaluation order though – but, the data will have been got by then. Note, two things to remember here – access paths is the means by which you get at the data you want (the index chosen for instance) and the evaluation order is like it says on the tin – the order the expressions (logic) on the WHERE clause is evaluated.

The evaluation order is fixed and can never change; however, the access paths can change every time the query plan is compiled.

Going back to our examples, the reason the first query worked is because a) the optimiser chose a table scan and b) rearranged our WHERE clause so that the highest cost items where done last, so it becomes...

where cat <= 1000

  and subcat = 3

  and right( acharcolumn, 1 ) = 1

Makes sense when you think about it, why evaluate the right() for each row, instead, find all the rows where cat <= 1000 and then the rows in that where subcat = 3 and for the small subset of rows do the right function.

Introducing more data changes the plan, this time the index nc1 is used, remember the index is on cat, acharcolumn so our where clause now looks somewhat different, it actually gets executed in this order...

where cat <= 1000

  and right( acharcolumn, 1 ) = 1
  and subcat = 3

 

Hence our problem! The rows where cat <= 1000 are found and then the right() applied, only – this is where one of the rows has a non-numeric character that fails, because converting that character ‘T’ to integer – which is what SQL Server implicitly does because ‘= 1’, the 1 is an integer then bang – failure.

What’s the answer?

Be very careful when doing data conversions from character to numeric, avoid them in queries if possible.

Our query can be rewritten..

select acharcolumn

from #datatype_example

where cat <= 1000

  and right( acharcolumn, 1 ) = ‘1’

  and subcat = 3

There are a whole raft of data conversion problems in SQL Server, especially around precision and rounding, truncation etc... Be aware of them and test your stuff!

One last thing, remember - these problems can be intermittent, I've got a half written article on parameter sniffing and compiled plans; basically - if your plan is cached the session that caused the compilation may have used parameters that give a query plan that does not give your error, however, that same query (stored proc for instance) could be executed if a different set of parameters creating a different plan which causes the error. Don't get too eat up on that, its for another post.

 

Got asked this from a friend, I have a column treeID that has values 1 - 5 and for each value I require 20 random rows, so 20 for 1, 20 for 2 etc... Is there a query that will do this other than calling the stored procedure 5 times?

First thought is to use 5 queries and a UNION ALL, however the syntax does not support it ->

select top 20 somedata

from #blah

order by newid()

union all

select top 20 somedata = somedata + 1

from #blah

order by newid()

Gives this error ->

Msg 156, Level 15, State 1, Line 4

Incorrect syntax near the keyword 'union'.

To get round this problem we can use derived tables, think of these as a self-contained table, a bit like a temporary table within the query.

select somedata

from (

    select top 20 somedata

    from #blah

    order by newid() ) as d

union all

select *

from (

    select top 20 somedata = somedata + 1

    from #blah

    order by newid() ) as d

The NEWID() gives a different GUID per row, unlike CURRENT_TIMESTAMP for instance which is evaluated once regardless of how many rows (you always get the same value), NEWID() evaluates for every row.

You can probably do something with a Common Table Expression as well, any takers?

Congrats to Wanderer (see comments) on the first to give me a CTE example, nice!...

WITH Top20Blah (SomeDataOut) as (

    select top 20 somedata

    from #blah

    order by newid() )

select 1, *

from Top20Blah

union all

select 2, *

from Top20Blah

union all

select 3, *

from Top20Blah

union all

select 4, *

from Top20Blah

union all

select 5, *

from Top20Blah


It just gets better, here is a CROSS APPLY example (see comments below) from Adam Machanic, I never even thought about CROSS APPLY - perfect use for it.

select p.somedata, x.treeid

from (   select 1

         union all

         select 2

         union all

         select 3

         union all

         select 4

         union all

         select 5

        ) x (treeid)

    cross apply (

           select top 20 somedata

           from #blah

           where treeid = x.treeid

           order by checksum(newid())

        ) p

 

Got another I've just worked out myself...

select *

from (

    select somedata,

           treeid,

           row_number() over ( partition by treeid order by newid() ) as rownumber

    from #blah

    ) as r

where r.rownumber <= 20

order by treeid

 

LaCie have partnered with Lego to bring you removeable hard drives.

What will you build?

http://www.purelygadgets.co.uk/prodinfo.php?prodid=3049&start=0&thislist=30~234~329

 

So I'm interested in what people out there are using, please place your vote on the poll below.

  

So, anybody use SQL Instances? Anybody got a server name > 30 characters? Well, you are stuck on SP1 until a hotfix is produced to fix this regression in Management Studio. If you find this important, the fact that they've hard coded (picked out the blue no doubt) 30 characters for a server name instead of using sysname then show your disgust and disapprovement at my connect entry...

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=273298

SQL Agent Jobs and Maint plans are not editable when using SP2 tools against servers with a server name > 30 characters.

After creating a job, trying to double click or script the job SSMS gives a binary or string truncation error. This occurs when the total server name length is > 30 characters, for instance our server name is: EN-SQL2K5-PROD1\EN_SQL2K5_PROD1.

This also affects maintanence plans, trying to edit an existing maint plan fails with the same binary or string truncation error.

This is a new bug introduced by SP2 and is not fixed by cumulative hotfix 3159.

This is a serious blocking issue that prevents DBA's using SMSS on SP2 for servers with name > 30 characters in length.

On profiling what SMSS sends to SQL Server the problem is the developer assumes a max server length of nvarchar(30) - frankly where is this arbitary number come from when max server name is sysname (nvarchar(128)).

The script that SMSS runs is shown below.....

create table #tmp_sp_help_jobserver
(server_id int null, server_name nvarchar(30) null, enlist_date datetime null, last_poll_date datetime null, last_run_date int null, last_run_time int null, last_run_duration int null, last_run_outcome tinyint null, last_outcome_message nvarchar(1024) null, job_id uniqueidentifier null)

declare @job_id uniqueidentifier
declare crs cursor local fast_forward
for ( SELECT
sv.job_id AS [JobID]
FROM
msdb.dbo.sysjobs_view AS sv
WHERE
(sv.name=N'REPL - TEST LINKED SERVERS' and sv.category_id=N'0') )
open crs
fetch crs into @job_id
while @@fetch_status >= 0
begin
insert into #tmp_sp_help_jobserver(server_id, server_name, enlist_date, last_poll_date, last_run_date, last_run_time, last_run_duration, last_run_outcome, last_outcome_message)
        exec msdb.dbo.sp_help_jobserver @job_id = @job_id, @show_last_run_details = 1
    update #tmp_sp_help_jobserver set job_id = @job_id where job_id is null
    fetch crs into @job_id
end
close crs
deallocate crs

 

Watch yourself on this one, my client has an 8GB machine, two SQL instances (standard edition), one uses 6GB (production) and one uses 512MB of memory (development), the server is dedicated so there is a reasonable amount of available memory.

Suddenly all hell broke loose – people could not connect, I could not connect – not even via the dedicated admin connection; I could not even stop the SQL Server service.

In the eventvwr there was a message “A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 13128, committed (KB): 6239692, memory utilization: 0%%.

What does that mean? Well, basically the entire working set SQL Server was using got paged out of memory to disk, leaving SQL Server with just 13Mbytes of RAM in use – no wonder SQL Server became unresponsive! “Performance degradation” – now there is an understatement or what!

This support article documents the problem: http://support.microsoft.com/kb/918483.

Our particular circumstance came about when we copied a number of 1.1GByte files off the server.

So, I guess the general advice here is to “Lock Pages in memory”.