SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

Inconsistent NullIF Behaviour ?

NullIf can be a very handy function sometimes.  For those who have not used it, if the result of the two parameters are equal then null is returned.  So if we execute :

Code Snippet
  1. Select nullif(10,10) as Result1
  2. Select nullif(10,9)  as Result2

 

We get the result of

image

For some test data, I required a column that was randomly null.  Using ABS(CHECKSUM(NEWID())) to generate the random value and then modulus 10, to get a random value between 0 and 9 , then passing that into NULLIF as one parameter and 10 as the other, I should of got random values between 1 and 9 with a selection of NULL values.  Here’s the code :

Code Snippet
  1. select top(1000) ROW_NUMBER() over (order by (select null)) as ID,
  2.        nullif(ABS(checksum(newid()))%10,0)  as RandomNonZero
  3. from sys.columns a

And here's the result…

image

Interesting , does that mean that NULLIF is not working ?  Plainly it is sometimes, after all there are some NULL values. 

Much like in my previous post detailing the differences between ISNULL and COALESCE,  the NULLIF function is expanded by the engine into a case statement as shown below.

 

image

So,  its working but the value outputted is not the value tested.

Comments

Rob Farley said:

The problem isn't really with nullif, it's with newid(). Being non-deterministic, the system doesn't produce the same value twice, which it needs to for nullif. I demonstrate this behaviour on looking for a random person and getting different numbers of people back.

I think you may be able to fix it with a subquery, particularly if you can persuade it to materialise it.

# August 7, 2010 1:09 AM

Rob Farley said:

Nice post though, as always.

# August 7, 2010 1:11 AM

brad_schulz said:

Rob beat me to it... I was going to say what he said.

He's right... you can fix it like so with a CROSS APPLY (hope it formats okay):

select top(1000) ROW_NUMBER() over (order by (select null)) as ID,

      nullif(MyNewID,0)  as RandomNonZero

from sys.columns a

cross apply (select MyNewID=abs(checksum(newid()))%10) f

And I agree with Rob that it's a nice post.  I knew that COALESCE() was translated into a CASE expression, but I never "looked under the hood" at the NULLIF() function... good to know!

--Brad

# August 7, 2010 3:23 AM

Dave Ballantyne said:

Hi Rob, Brad,

To be fair this is documented in BOL.

"We recommend that you not use time-dependent functions, such as RAND(), within a NULLIF function. This could cause the function to be evaluted(sic) twice and to return different results from the two invocations. ".

But should there really be two invocations ?  This could be a complicated formula or subselect which causes a massive efficiency problem.  Lets not forget our old favourite, a scalar udf being executed twice as well.

From where i sit, it would seem easy enough for NULLIF to be implemented as an "atomic" function within the SQLServer source code.

# August 7, 2010 8:53 AM

Paul White NZ said:

There is a full Microsoft explanation for why the optimiser does not guarantee the number of times a scalar expression might be evaluated here: connect.microsoft.com/.../bug-with-newid-and-table-expressions

Personally, I am quite happy that things work as they do.  The query plan improvements possible by allowing the optimiser the freedom it has more than compensate for any perceived 'inconsistent' behaviour.

That said, it would be good to see more extensive use of common subexpressions in the optimiser, which would prevent the issue arising in the first place.  Not as easy as it sounds, however.

Paul

# August 7, 2010 9:34 AM