11 May 2008 19:47 tonyrogerson

Common Table Expressions (CTE's) - How it works; How Recursion Works; Using with Adjacency List

Recursion in Common Table Expressions (CTE’s), how does it work? How can I use it with the adjacency list model (parent / child columns)? In this blog entry I show how recursion works, how to use it with the adjacency list model and talk about other aspects and uses for CTE’s; I also demonstrate some of the areas where CTE’s aren’t that good and how to help remedy that performance penalty suffered.

This weekend I’ve been at the Scottish Developers Conference in Glasgow – a great day, I gave my making the leap to advanced TSQL session; as ever like the last time I ran out of time even though I’d cut a lot of stuff out. Congrats to Colin MacKay, Martin Bell and John Thomson for a great and well organised event – see http://www.scottishdevelopers.com/ for more details include slides etc.

CTE Basics

declare @howmany int

set @howmany = 5

 

declare @tb table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

; WITH LoopIT( org_i, i, recurse_depth, descr )

AS (

      --    The anchor is the starting point for possible recursion

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb

     

)

select org_i, i, recurse_depth, descr

from LoopIT

order by org_i, i

go

The query marked in italics above is our actual query involved in the CTE; the fluff around it – the WITH clause, the end SELECT on LoopIT are all the CTE framework.

The CTE is nothing but a derived table (think of it like a View) that is expanded into the main query, unlike a derived table with a CTE you can do self-joins and recursion.

Look at the query plan for the entire CTE structure above...

  |--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
       |--Table Scan(OBJECT:(@tb))

Now compare it to just the query that we term the anchor:

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

FROM @tb

 

  |--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
       |--Table Scan(OBJECT:(@tb))

It’s important to remember that CTE’s offer no direct performance benefits in fact we will see later that they can impact performance.

Recursion Basics

The two main features of CTE’s are that you can self-join and also perform recursion.

Consider the query below which shows recursion and a subtle self-join (the self-join being the bottom query in the UNION ALL and the CTE (notice we “self-join” with LoopIT):

--  Recursion

declare @howmany int

set @howmany = 5

 

declare @tb table( i int not null )

declare @tb2 table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

insert @tb2 values ( 4 )

 

; WITH LoopIT( org_i, i, recurse_depth, descr )

AS (

      --    The anchor is the starting point for possible recursion

      --    The anchor can use UNION [ALL], INTERSECT, EXCEPT

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb

 

      EXCEPT

 

      SELECT org_i = i,

               i = i,

               recurse_depth = CAST( 0 as tinyint ),

               DESCR = CAST( 'anchor' as varchar(100) )

      FROM @tb2

     

      --    Recursive Member; any rows returned are inserted into CTE

      --    which are also then available (seen) on the next recursion

      UNION ALL

 

      SELECT org_i = org_i,

               i = i + 1,

               recurse_depth = recurse_depth + cast( 1 as tinyint ),

               descr = CAST( 'recursed' as varchar(100) )

      FROM LoopIT

      WHERE i < @howmany

)

select org_i, i, recurse_depth, descr

from LoopIT

order by org_i, i

go

 

The starting point is the first query – the anchor; the anchor is iterated through row by row (think cursor), the first row is passed into the recursive member query, if there are any results that set is first saved to the final result set but also used as input to the next recursive execution of the recursive member query, this process happens until the recursive member query returns no more rows and at that point it moves onto the next row from the anchor query.

Below is an illustration in TSQL of just how the Common Table Expression recursion works logically:

declare @tb table( i int not null )

declare @tb2 table( i int not null )

 

insert @tb values ( 1 )

insert @tb values ( 4 )

insert @tb values ( 9 )

 

insert @tb2 values ( 4 )

 

 

--    Create the anchor

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

INTO #anchor

FROM @tb

 

EXCEPT

 

SELECT org_i = i,

       i = i,

       recurse_depth = CAST( 0 as tinyint ),

       DESCR = CAST( 'anchor' as varchar(100) )

FROM @tb2

 

--    Create the interim recursed set

SELECT *

INTO #recurse_set

FROM #anchor

WHERE 1 = 0

 

--    Previous recurse set

SELECT *

INTO #recurse_set_previous

FROM #anchor

WHERE 1 = 0

 

--    Final result

SELECT *

INTO #result

FROM #anchor

WHERE 1 = 0

 

TRUNCATE TABLE #recurse_set

TRUNCATE TABLE #recurse_set_previous

TRUNCATE TABLE #result

 

DECLARE @howmany int

SET @howmany = 5

 

DECLARE @org_i int, @i int, @recurse_depth int, @descr varchar(100)

 

DECLARE anchor_members CURSOR FOR

      SELECT *

      FROM #anchor

 

OPEN anchor_members

 

FETCH NEXT FROM anchor_members

      INTO @org_i, @i, @recurse_depth, @descr

     

WHILE @@FETCH_STATUS = 0

BEGIN

      --    Save anchor to result

      INSERT #result ( org_i, i, recurse_depth, descr )

            SELECT @org_i, @i, @recurse_depth, @descr

 

      --    Prime recursion by creating the single row Anchor set.

      TRUNCATE TABLE #recurse_set

 

      INSERT #recurse_set ( org_i, i, recurse_depth, descr )

            SELECT @org_i, @i, @recurse_depth, @descr

 

      WHILE 1 = 1 -- Do forever (in the CTE it stops at 100 by default, changeable using MAXRECURSION)

      BEGIN

            --    Create the previous set

            TRUNCATE TABLE #recurse_set_previous

 

            INSERT #recurse_set_previous ( org_i, i, recurse_depth, descr )

                  SELECT org_i, i, recurse_depth, descr

                  FROM #recurse_set

 

            --    Recurse creating the new set; if no rows then recursion has finished,

            --    stop and do the next anchor row

            TRUNCATE TABLE #recurse_set

                       

            INSERT #recurse_set ( org_i, i, recurse_depth, descr )

                  SELECT org_i = org_i,

                         i = i + 1,

                         recurse_depth = recurse_depth + cast( 1 as tinyint ),

                         descr = CAST( 'recursed' as varchar(100) )

                  FROM #recurse_set_previous --LoopIT

                  WHERE i < @howmany  --  Adds in new rows to the CTE spool table (temporary table) (LoopIT)

 

            IF @@ROWCOUNT = 0

                  BREAK

 

            --    Save recurse result set into final result (UNION ALL) then recurse again

            INSERT #result ( org_i, i, recurse_depth, descr )

                  SELECT org_i, i, recurse_depth, descr

                  FROM #recurse_set

 

      END

 

      --    Get the next Anchor

      FETCH NEXT FROM anchor_members

            INTO @org_i, @i, @recurse_depth, @descr

 

END

DEALLOCATE anchor_members

 

SELECT *

FROM #result

 

DROP TABLE #recurse_set, #recurse_set_previous, #result, #anchor

 

Using Common Table Expression to traverse the adjacency List Model

It’s common practice to use either the adjacency List model (parent/child columns) or the materialised path (root.node.node.node) to store hierarchies.

The problem with the adjacency list approach is that you need to use many self-joins to iterate the hierarchy (a join per level) or use a cursor as above.

You can now use a CTE; see example below...

create table adjacency_list (

      parent_extractedword_id int   not null,

      child_extractedword_id  int not null,

            constraint pk_adjacency_list primary key clustered( parent_extractedword_id, child_extractedword_id )

      )

     

create table extractedword (

      id    int not null constraint pk_extractedword primary key clustered,

      word varchar(50) not null

      )

insert extractedword( id, word ) values( 11, 'demo' )

insert extractedword( id, word ) values( 12, 'on' )

insert extractedword( id, word ) values( 13, 'cte' )

insert extractedword( id, word ) values( 14, 'recursion' )

     

insert extractedword( id, word ) values( 21, 'does' )

insert extractedword( id, word ) values( 22, 'this' )

insert extractedword( id, word ) values( 23, 'rock' )

insert extractedword( id, word ) values( 24, 'ya' )

insert extractedword( id, word ) values( 25, 'boat?' )

 

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 11, 12 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 12, 13 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 13, 14 )

 

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 21, 22 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 22, 23 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 23, 24 )

insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 24, 25 )

 

; WITH WordStruct( org_parent_id, child_id, recurse_depth )

AS (

      --    The anchor is the starting point for possible recursion