11 May 2008 19:47
tonyrogerson
Common Table Expressions (CTE's) - How it works; How Recursion Works; Using with Adjacency List
Recursion in Common Table Expressions (CTE’s), how does it work? How can I use it with the adjacency list model (parent / child columns)? In this blog entry I show how recursion works, how to use it with the adjacency list model and talk about other aspects and uses for CTE’s; I also demonstrate some of the areas where CTE’s aren’t that good and how to help remedy that performance penalty suffered.
This weekend I’ve been at the Scottish Developers Conference in Glasgow – a great day, I gave my making the leap to advanced TSQL session; as ever like the last time I ran out of time even though I’d cut a lot of stuff out. Congrats to Colin MacKay, Martin Bell and John Thomson for a great and well organised event – see http://www.scottishdevelopers.com/ for more details include slides etc.
CTE Basics
declare @howmany int
set @howmany = 5
declare @tb table( i int not null )
insert @tb values ( 1 )
insert @tb values ( 4 )
insert @tb values ( 9 )
; WITH LoopIT( org_i, i, recurse_depth, descr )
AS (
-- The anchor is the starting point for possible recursion
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
FROM @tb
)
select org_i, i, recurse_depth, descr
from LoopIT
order by org_i, i
go
The query marked in italics above is our actual query involved in the CTE; the fluff around it – the WITH clause, the end SELECT on LoopIT are all the CTE framework.
The CTE is nothing but a derived table (think of it like a View) that is expanded into the main query, unlike a derived table with a CTE you can do self-joins and recursion.
Look at the query plan for the entire CTE structure above...
|--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
|--Table Scan(OBJECT:(@tb))
Now compare it to just the query that we term the anchor:
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
FROM @tb
|--Compute Scalar(DEFINE:([Expr1004]=(0), [Expr1005]='anchor'))
|--Table Scan(OBJECT:(@tb))
It’s important to remember that CTE’s offer no direct performance benefits in fact we will see later that they can impact performance.
Recursion Basics
The two main features of CTE’s are that you can self-join and also perform recursion.
Consider the query below which shows recursion and a subtle self-join (the self-join being the bottom query in the UNION ALL and the CTE (notice we “self-join” with LoopIT):
-- Recursion
declare @howmany int
set @howmany = 5
declare @tb table( i int not null )
declare @tb2 table( i int not null )
insert @tb values ( 1 )
insert @tb values ( 4 )
insert @tb values ( 9 )
insert @tb2 values ( 4 )
; WITH LoopIT( org_i, i, recurse_depth, descr )
AS (
-- The anchor is the starting point for possible recursion
-- The anchor can use UNION [ALL], INTERSECT, EXCEPT
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
FROM @tb
EXCEPT
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
FROM @tb2
-- Recursive Member; any rows returned are inserted into CTE
-- which are also then available (seen) on the next recursion
UNION ALL
SELECT org_i = org_i,
i = i + 1,
recurse_depth = recurse_depth + cast( 1 as tinyint ),
descr = CAST( 'recursed' as varchar(100) )
FROM LoopIT
WHERE i < @howmany
)
select org_i, i, recurse_depth, descr
from LoopIT
order by org_i, i
go
The starting point is the first query – the anchor; the anchor is iterated through row by row (think cursor), the first row is passed into the recursive member query, if there are any results that set is first saved to the final result set but also used as input to the next recursive execution of the recursive member query, this process happens until the recursive member query returns no more rows and at that point it moves onto the next row from the anchor query.
Below is an illustration in TSQL of just how the Common Table Expression recursion works logically:
declare @tb table( i int not null )
declare @tb2 table( i int not null )
insert @tb values ( 1 )
insert @tb values ( 4 )
insert @tb values ( 9 )
insert @tb2 values ( 4 )
-- Create the anchor
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
INTO #anchor
FROM @tb
EXCEPT
SELECT org_i = i,
i = i,
recurse_depth = CAST( 0 as tinyint ),
DESCR = CAST( 'anchor' as varchar(100) )
FROM @tb2
-- Create the interim recursed set
SELECT *
INTO #recurse_set
FROM #anchor
WHERE 1 = 0
-- Previous recurse set
SELECT *
INTO #recurse_set_previous
FROM #anchor
WHERE 1 = 0
-- Final result
SELECT *
INTO #result
FROM #anchor
WHERE 1 = 0
TRUNCATE TABLE #recurse_set
TRUNCATE TABLE #recurse_set_previous
TRUNCATE TABLE #result
DECLARE @howmany int
SET @howmany = 5
DECLARE @org_i int, @i int, @recurse_depth int, @descr varchar(100)
DECLARE anchor_members CURSOR FOR
SELECT *
FROM #anchor
OPEN anchor_members
FETCH NEXT FROM anchor_members
INTO @org_i, @i, @recurse_depth, @descr
WHILE @@FETCH_STATUS = 0
BEGIN
-- Save anchor to result
INSERT #result ( org_i, i, recurse_depth, descr )
SELECT @org_i, @i, @recurse_depth, @descr
-- Prime recursion by creating the single row Anchor set.
TRUNCATE TABLE #recurse_set
INSERT #recurse_set ( org_i, i, recurse_depth, descr )
SELECT @org_i, @i, @recurse_depth, @descr
WHILE 1 = 1 -- Do forever
BEGIN
-- Create the previous set
TRUNCATE TABLE #recurse_set_previous
INSERT #recurse_set_previous ( org_i, i, recurse_depth, descr )
SELECT org_i, i, recurse_depth, descr
FROM #recurse_set
-- Recurse creating the new set; if no rows then recursion hsa finished,
-- stop and do the next anchor row
TRUNCATE TABLE #recurse_set
INSERT #recurse_set ( org_i, i, recurse_depth, descr )
SELECT org_i = org_i,
i = i + 1,
recurse_depth = recurse_depth + cast( 1 as tinyint ),
descr = CAST( 'recursed' as varchar(100) )
FROM #recurse_set_previous --LoopIT
WHERE i < @howmany -- Adds in new rows to the CTE spool table (temporary table) (LoopIT)
-- Save recurse result set into final result (UNION ALL)
INSERT #result ( org_i, i, recurse_depth, descr )
SELECT org_i, i, recurse_depth, descr
FROM #recurse_set
IF @@ROWCOUNT = 0
BREAK
END
-- Get the next Anchor
FETCH NEXT FROM anchor_members
INTO @org_i, @i, @recurse_depth, @descr
END
DEALLOCATE anchor_members
SELECT *
FROM #result
DROP TABLE #recurse_set, #recurse_set_previous, #result, #anchor
Using Common Table Expression to traverse the adjacency List Model
It’s common practice to use either the adjacency List model (parent/child columns) or the materialised path (root.node.node.node) to store hierarchies.
The problem with the adjacency list approach is that you need to use many self-joins to iterate the hierarchy (a join per level) or use a cursor as above.
You can now use a CTE; see example below...
create table adjacency_list (
parent_extractedword_id int not null,
child_extractedword_id int not null,
constraint pk_adjacency_list primary key clustered( parent_extractedword_id, child_extractedword_id )
)
create table extractedword (
id int not null constraint pk_extractedword primary key clustered,
word varchar(50) not null
)
insert extractedword( id, word ) values( 11, 'demo' )
insert extractedword( id, word ) values( 12, 'on' )
insert extractedword( id, word ) values( 13, 'cte' )
insert extractedword( id, word ) values( 14, 'recursion' )
insert extractedword( id, word ) values( 21, 'does' )
insert extractedword( id, word ) values( 22, 'this' )
insert extractedword( id, word ) values( 23, 'rock' )
insert extractedword( id, word ) values( 24, 'ya' )
insert extractedword( id, word ) values( 25, 'boat?' )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 11, 12 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 12, 13 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 13, 14 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 21, 22 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 22, 23 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 23, 24 )
insert adjacency_list ( parent_extractedword_id, child_extractedword_id ) values( 24, 25 )
; WITH WordStruct( org_parent_id, child_id, recurse_depth )
AS (
-- The anchor is the starting point for possible recursion
SELECT org_parent_id = parent_extractedword_id,