Sunday, May 11, 2008 7:47 PM tonyrogerson

Common Table Expressions (CTE's) - How it works; How Recursion Works; Using with Adjacency List

Recursion in Common Table Expressions (CTE’s), how does it work? How can I use it with the adjacency list model (parent / child columns)? In this blog entry I show how recursion works, how to use it with the adjacency list model and talk about other aspects and uses for CTE’s; I also demonstrate some of the areas where CTE’s aren’t that good and how to help remedy that performance penalty suffered.

This weekend I’ve been at the Scottish Developers Conference in Glasgow – a great day, I gave my making the leap to advanced TSQL session; as ever like the last time I ran out of time even though I’d cut a lot of stuff out. Congrats to Colin MacKay, Martin Bell and John Thomson for a great and well organised event – see http://www.scottishdevelopers.com/ for more details include slides etc.

CTE Basics

declare @howmany int

set @howmany = 5

 

declare @tb table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

; WITH LoopIT( org_i, i, recurse_depth, descr )

AS (

      --    The anchor is the starting point for possible recursion

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb

     

)

select org_i, i, recurse_depth, descr

from LoopIT

order by org_i, i

go

The query marked in italics above is our actual query involved in the CTE; the fluff around it – the WITH clause, the end SELECT on LoopIT are all the CTE framework.

The CTE is nothing but a derived table (think of it like a View) that is expanded into the main query, unlike a derived table with a CTE you can do self-joins and recursion.

Look at the query plan for the entire CTE structure above...

  |--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
       |--Table Scan(OBJECT:(@tb))

Now compare it to just the query that we term the anchor:

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

FROM @tb

 

  |--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
       |--Table Scan(OBJECT:(@tb))

It’s important to remember that CTE’s offer no direct performance benefits in fact we will see later that they can impact performance.

Recursion Basics

The two main features of CTE’s are that you can self-join and also perform recursion.

Consider the query below which shows recursion and a subtle self-join (the self-join being the bottom query in the UNION ALL and the CTE (notice we “self-join” with LoopIT):

--  Recursion

declare @howmany int

set @howmany = 5

 

declare @tb table( i int not null )

declare @tb2 table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

insert @tb2 values ( 4 )

 

; WITH LoopIT( org_i, i, recurse_depth, descr )

AS (

      --    The anchor is the starting point for possible recursion

      --    The anchor can use UNION [ALL], INTERSECT, EXCEPT

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb

 

      EXCEPT

 

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb2

     

      --    Recursive Member; any rows returned are inserted into CTE

      --    which are also then available (seen) on the next recursion

      UNION ALL

 

      SELECT org_i = org_i,

               i = i + 1,

               recurse_depth = recurse_depth + cast( 1 as tinyint ),

               descr = CAST( 'recursed' as varchar(100) )

      FROM LoopIT

      WHERE i < @howmany

)

select org_i, i, recurse_depth, descr

from LoopIT

order by org_i, i

go

 

The starting point is the first query – the anchor; the anchor is iterated through row by row (think cursor), the first row is passed into the recursive member query, if there are any results that set is first saved to the final result set but also used as input to the next recursive execution of the recursive member query, this process happens until the recursive member query returns no more rows and at that point it moves onto the next row from the anchor query.

Below is an illustration in TSQL of just how the Common Table Expression recursion works logically:

declare @tb table( i int not null )

declare @tb2 table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

insert @tb2 values ( 4 )

 

 

--    Create the anchor

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

INTO #anchor

FROM @tb

 

EXCEPT

 

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

FROM @tb2

 

--    Create the interim recursed set

SELECT *

INTO #recurse_set

FROM #anchor

WHERE 1 = 0

 

--    Previous recurse set

SELECT *

INTO #recurse_set_previous

FROM #anchor

WHERE 1 = 0

 

--    Final result

SELECT *

INTO #result

FROM #anchor

WHERE 1 = 0

 

TRUNCATE TABLE #recurse_set

TRUNCATE TABLE #recurse_set_previous

TRUNCATE TABLE #result

 

DECLARE @howmany int

SET @howmany = 5

 

DECLARE @org_i int, @i int, @recurse_depth int, @descr varchar(100)

 

DECLARE anchor_members CURSOR FOR

      SELECT *

      FROM #anchor

 

OPEN anchor_members

 

FETCH NEXT FROM anchor_members

      INTO @org_i, @i, @recurse_depth, @descr

     

WHILE @@FETCH_STATUS = 0

BEGIN

      --    Save anchor to result

      INSERT #result ( org_i, i, recurse_depth, descr )

            SELECT @org_i, @i, @recurse_depth, @descr

 

      --    Prime recursion by creating the single row Anchor set.

      TRUNCATE TABLE #recurse_set

 

      INSERT #recurse_set ( org_i, i, recurse_depth, descr )

            SELECT @org_i, @i, @recurse_depth, @descr

 

      WHILE 1 = 1 -- Do forever (in the CTE it stops at 100 by default, changeable using MAXRECURSION)

      BEGIN

            --    Create the previous set

            TRUNCATE TABLE #recurse_set_previous

 

            INSERT #recurse_set_previous ( org_i, i, recurse_depth, descr )

                  SELECT org_i, i, recurse_depth, descr

                  FROM #recurse_set

 

            --    Recurse creating the new set; if no rows then recursion has finished,

            --    stop and do the next anchor row

            TRUNCATE TABLE #recurse_set

                       

            INSERT #recurse_set ( org_i, i, recurse_depth, descr )

                  SELECT org_i = org_i,

                         i = i + 1,

                         recurse_depth = recurse_depth + cast( 1 as tinyint ),

                         descr = CAST( 'recursed' as varchar(100) )

                  FROM #recurse_set_previous --LoopIT

                  WHERE i < @howmany  --  Adds in new rows to the CTE spool table (temporary table) (LoopIT)

 

            IF @@ROWCOUNT = 0

                  BREAK

 

            --    Save recurse result set into final result (UNION ALL) then recurse again

            INSERT #result ( org_i, i, recurse_depth, descr )

                  SELECT org_i, i, recurse_depth, descr

                  FROM #recurse_set

 

      END

 

      --    Get the next Anchor

      FETCH NEXT FROM anchor_members

            INTO @org_i, @i, @recurse_depth, @descr

 

END

DEALLOCATE anchor_members

 

SELECT *

FROM #result

 

DROP TABLE #recurse_set, #recurse_set_previous, #result, #anchor

 

Using Common Table Expression to traverse the adjacency List Model

It’s common practice to use either the adjacency List model (parent/child columns) or the materialised path (root.node.node.node) to store hierarchies.

The problem with the adjacency list approach is that you need to use many self-joins to iterate the hierarchy (a join per level) or use a cursor as above.

You can now use a CTE; see example below...

create table adjacency_list (

      parent_extractedword_id int   not null,

      child_extractedword_id  int not null,

            constraint pk_adjacency_list primary key clustered( parent_extractedword_id, child_extractedword_id )

      )

     

create table extractedword (

      id    int not null constraint pk_extractedword primary key clustered,

      word varchar(50) not null

      )

insert extractedword( id, word ) values( 11, 'demo' )

insert extractedword( id, word ) values( 12, 'on' )

insert extractedword( id, word ) values( 13, 'cte' )

insert extractedword( id, word ) values( 14, 'recursion' )

     

insert extractedword( id, word ) values( 21, 'does' )

insert extractedword( id, word ) values( 22, 'this' )

insert extractedword( id, word ) values( 23, 'rock' )

insert extractedword( id, word ) values( 24, 'ya' )

insert extractedword( id, word ) values( 25, 'boat?' )

 

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 11, 12 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 12, 13 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 13, 14 )

 

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 21, 22 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 22, 23 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 23, 24 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 24, 25 )

 

; WITH WordStruct( org_parent_id, child_id, recurse_depth )

AS (

      --    The anchor is the starting point for possible recursion

      SELECT org_parent_id = parent_extractedword_id,

             child_id      = child_extractedword_id,

             recurse_depth = CAST( 0 as tinyint )

      FROM adjacency_list l1

      WHERE NOT EXISTS ( SELECT *         --    The Anchor should only contain root PARENT's

                         FROM adjacency_list l2

                         WHERE l2.child_extractedword_id = l1.parent_extractedword_id )

                                

      --    Recursive Member

      UNION ALL

 

      SELECT org_parent_id = l1.child_id,

             child_id      = l2.child_extractedword_id,

             recurse_depth = recurse_depth + cast( 1 as tinyint )

      FROM WordStruct l1

            INNER JOIN adjacency_list l2 on l2.parent_extractedword_id = l1.child_id

)

select e.id, e.word, ws.*,

       case when org_parent_id=e.id then 'ROOT' else 'CHILD.DEPTH=' + CAST( recurse_depth + 1 as varchar(5) ) end

from WordStruct ws

      INNER JOIN extractedword e on e.id = ws.child_id

                                 or ( ws.recurse_depth = 0                 --      Do this to get the desc for the root parent

                                  and e.id = ws.org_parent_id )

order by ws.org_parent_id, ws.child_id

go

 

First we step through the hierarchy and get all our surrogate keys and then on the final phase of the CTE at the point where all the anchor rows we join to the table that will give us the node descriptions.

Performance Problems

The CTE is nothing but expanded SQL; by that I mean the anchor CTE is not materialised into a work table (temporary table); unfortunately this presents us with a number of problems which will hopefully be resolved in future versions of SQL Server.

The two specific ones I want to talk about are

·         CTE Self-Join in the result

·         Use of ROW_NUMBER in the CTE – Applied before filtering – ouch!

I’ll save those for another article in the coming week and I’ll delve into IO Statistics and the query plans and what and why it’s happening – oh, strategies for dealing with it too.

Update 2008-05-18:

So I got round to doing the performance posts too:

When not using recursion but self-joining with the CTE performance sucks:

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-1-cte-self-join-cte-sub-query-inline-expansion.aspx

When not using recursion and using ROW_NUMBER in the CTE anchor it's executed inefficiently causing a massive amount of additional resource:

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-2-row-number-is-executed-number-of-cte-references-x-number-of-rows-from-the-anchor.aspx

 

Filed under: , ,

Comments

# Interesting Finds: May 12, 2008

Monday, May 12, 2008 3:07 PM by Jason Haley

# Bug in inline expansion of non-deterministic functions in derived tables and CTE's causes incorrect results

Thursday, June 12, 2008 8:25 AM by Tony Rogerson's ramblings on SQL Server

Using non-deterministic functions in CTE&#39;s gives incorrect results, this follows on from two things

# need advise about a query | keyongtech

Sunday, January 18, 2009 4:32 PM by need advise about a query | keyongtech

Pingback from  need advise about a query | keyongtech

# Morning News 2009-07-14 &laquo; C# Hacker &#8211; The Rambling Coder

Pingback from  Morning News 2009-07-14 &laquo;  C# Hacker &#8211; The Rambling Coder