SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

September 2010 - Posts

An impossible case

In a previous blog entry (Inconsistent NullIf behaviour), Jeff Moden and Dan Halliday(twitter) both made a very similar observation.

When executing this code

Code Snippet
  1. select top(10) abs(checksum(newid()))%4       
  2. from sys.objects

the only possible values returned can be 0,1,2 or 3.  Anything else is a mathematical impossibility. Abs(checksum(new())) will return a ‘random’ positive integer and the modulus (%) operator will returned the remainder after the division.  So that being the case, how case this code return NULL values.

Code Snippet
  1. select top(10)
  2.        case abs(checksum(newid()))%4
  3.        when 0 then 0
  4.        when 1 then 1
  5.        when 2 then 2
  6.        when 3 then 3 end
  7. from sys.objects


Does that mean that SQLServer is mathematically flawed ? Should we avoid modulus ? Once again we need to see what SQLServer is ACTUALLY executing. Within the execution plan, the defined values for the ‘Compute Scalar’ operator is shown below.


Which when applied as sql code would be



Code Snippet
  1. Select top (10)
  2.        CASE WHEN abs(checksum(newid()))%(4)=(0) THEN (0) ELSE
  3.        CASE WHEN abs(checksum(newid()))%(4)=(1) THEN (1) ELSE
  4.        CASE WHEN abs(checksum(newid()))%(4)=(2) THEN (2) ELSE
  5.        CASE WHEN abs(checksum(newid()))%(4)=(3) THEN (3) ELSE
  6.        NULL END END END END)
  7. from sys.objects


Now its pretty clear where the NULL values are coming from.  Some people have suggested that the real problem is that newid() is non-deterministic, and as such gives a different value on each execution.  I would disagree with that point of view, newid() should be non-deterministic.

What is the real world value of this knowledge ? So what if you cant use a newid() in a case statement , does that really matter ? When put like that, I would have to agree that, no not really,  but let us now expand upon this.

Let us create a simplistic function in AdventureWorks

Code Snippet
  1. Create Function fnGetOrderDate(@SalesOrderId integer)
  2. returns date
  3. with schemabinding
  4. as
  5. begin
  6.   declare @OrderDate smalldatetime
  7.    select @OrderDate = OrderDate
  8.     from sales.SalesOrderHeader
  9.    where SalesOrderID = @SalesOrderId
  10.   return @OrderDate
  11. end

Then execute the following code

Code Snippet
  1. select  case datepart(MONTH,dbo.fnGetOrderDate(45038))
  2.         when 1 then 'Jan'
  3.         when 2 then 'Feb'
  4.         when 3 then 'Mar'
  5.         when 4 then 'Apr'
  6.         when 5 then 'May'
  7.         when 6 then 'Jun'
  8.         when 7 then 'Jul'
  9.         when 8 then 'Aug'
  10.         when 9 then 'Sept'
  11.         when 10 then 'Oct'
  12.         when 11 then 'Nov'
  13.         when 12 then 'Dec' end


with a trace on SP:StatementCompleted and SQL:BatchCompleted , we will then see this.


But what will happen when we change the passed SalesOrderId to a different value , try with 43659.


Now, this is probably saying more about inappropriate use of a scalar user defined function than the case statement itself , but it highlights how performance can be damaged with the combination of the two.