Thursday, June 12, 2008 8:12 AM tonyrogerson

Bug in inline expansion of non-deterministic functions in derived tables and CTE's causes incorrect results

Using non-deterministic functions in CTE's gives incorrect results, this follows on from two things - Itzik Ben Gan has raised a good Connect item that highlights a problem with NEWID() (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=350485) and also my series of performance related CTE acticles that highlight the problem caused by the way SQL Server does it's inline expansion.

In my article on CTE when using ROW_NUMBER() and self-joins performance: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-2-row-number-is-executed-number-of-cte-references-x-number-of-rows-from-the-anchor.aspx I demonstrated the sequence generator was being executed multiple times when self-joining, but what I wasn't aware of is that when using a non-deterministic function to ORDER on all hell breaks loose...

A bug in the behaviour when using a non-deterministic function in the OVER ORDER BY clause yields inaccurate results.

Check it out...

select 1 as id, 1 as d

into #t

union all select 1, 2 as d

union all select 1, 3 as d

union all select 1, 4 as d

union all select 1, 5 as d

union all select 1, 6 as d

union all select 1, 7 as d

union all select 2, 1 as d

union all select 2, 2 as d

union all select 2, 3 as d

union all select 2, 4 as d

union all select 2, 5 as d

union all select 2, 6 as d

union all select 2, 7 as d

 

select *, row_number() over( order by newid() ) as rn

into #x

from #t

This gives the correct results, both occurrances of rn should give the same value...

;with t2 ( id, d, rn )

as (

    select *

    from #x

    )

select *

from t2 a

    inner join t2 b

        on b.id = a.id

       and b.d  = a.d

order by a.rn

But using the CTE it gives the wrong results because the sequence generation is run twice (check the plan).

;with t2 ( id, d, rn )

as (

    select *, row_number() over( order by newid() ) as rn

    from #t

    )

select *

from t2 a

    inner join t2 b

        on b.id = a.id

       and b.d  = a.d

order by a.rn

My advice is clear - always look to using a temporary table as the anchor clause; doing so means CTE's are only really useful when using recursion, see the following articles:

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-1-cte-self-join-cte-sub-query-inline-expansion.aspx

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-2-row-number-is-executed-number-of-cte-references-x-number-of-rows-from-the-anchor.aspx 

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/11/common-table-expressions-cte-s-how-it-works-how-recursion-works-using-with-adjacency-list.aspx

I think the answer from Microsoft perspective is to improve the way inline expansion works.

Filed under: ,

Comments

No Comments