17 May 2008 15:38
tonyrogerson
Non Recursive Common Table Expressions - Performance Sucks 1 - CTE Self-Join / CTE Sub Query inline expansion
Unless you are using recursion then the Common Table Expression sucks and you shouldn’t use it; the big problem is that the SQL in the anchor is repeated rather than spooled into a work table.
In the last article I went through the basics of the Common Table Expression and CTE Recursion and hinted that there may be performance problems, in this article I delve a bit deeper – specifically we see how self-join’s on a non-recursive CTE are very bad for performance.
Looking at our example query below we are using the Common Table Expression with a self-join and additionally a sub-query on the select; don’t look too much at what I’m doing in the query because it pretty much doesn’t make any sense – the important aspects to concentrate on are the self-join and sub-query.
Example CTE:
with ctePerf ( objname, colid, colname, max_length )
as (
select objname = o.name,
colid = c.column_id,
colname = c.name,
c.max_length
from #sysobjs o
inner join #syscols c on c.object_id=o.object_id
where o.type='U'
and o.name= 'accounts'
)
select p.objname, p.colname, p.colid,
size_sofar = SUM( p2.max_length ),
size_left = ( SELECT SUM( p3.max_length )
FROM ctePerf p3
WHERE p3.objname=p.objname
and p3.colid > p.colid
)
from ctePerf p
left outer join ctePerf p2 on p2.objname=p.objname
and p2.colid <=p.colid
group by p.objname, p.colname, p.colid
order by objname, colid
Let’s look at the query plan for the query we use as the CTE anchor:
CTE “anchor” query and plan:
select objname = o.name,
colid = c.column_id,
colname = c.name,
c.max_length
from #sysobjs o
inner join #syscols c on c.object_id=o.object_id
where o.type='U'
and o.name= 'accounts'
|--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
|--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].[dbo].[#sysobjs].[type] as [o ].[type]='U' AND [tempdb].[dbo].[#sysobjs].[name] as [o ].[name]=N'accounts'))
|--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
Check the query plan after incorporating the query into the CTE (below), before looking at the query plan note we are referencing the CTE anchor 3 times (shown in bold – aliases p, p2 and p3):
Query and Execution Plan:
with ctePerf ( objname, colid, colname, max_length )
as (
select objname = o.name,
colid = c.column_id,
colname = c.name,
c.max_length
from #sysobjs o
inner join #syscols c on c.object_id=o.object_id
where o.type='U'
and o.name= 'accounts'
)
select p.objname, p.colname, p.colid,
size_sofar = SUM( p2.max_length ),
size_left = ( SELECT SUM( p3.max_length )
FROM ctePerf p3
WHERE p3.objname=p.objname
and p3.colid > p.colid
)
from ctePerf p
left outer join ctePerf p2 on p2.objname=p.objname
and p2.colid <=p.colid
group by p.objname, p.colname, p.colid
order by objname, colid
|--Compute Scalar(DEFINE:([Expr1022]=[Expr1022]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([o ].[name], [c ].[column_id]))
|--Compute Scalar(DEFINE:([Expr1015]=CASE WHEN [Expr1035]=(0) THEN NULL ELSE [Expr1036] END))
| |--Stream Aggregate(GROUP BY:([c ].[column_id], [c ].[name]) DEFINE:([Expr1035]=COUNT_BIG([tempdb].[dbo].[#syscols].[max_length] as [c ].[max_length]), [Expr1036]=SUM([tempdb].[dbo].[#syscols].[max_length] as [c ].[max_length]), [o ].[name]=ANY
| |--Nested Loops(Left Outer Join, WHERE:([tempdb].[dbo].[#syscols].[column_id] as [c ].[column_id]<=[tempdb].[dbo].[#syscols].[column_id] as [c ].[column_id]))
| |--Sort(ORDER BY:([c ].[column_id] ASC, [c ].[name] ASC))
| | |--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
| | |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].[dbo].[#sysobjs].[type] as [o ].[type]='U' AND [tempdb].[dbo].[#sysobjs].[name] as [o ].[name]=N'accounts'))
| | |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
| |--Table Spool
| |--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
| |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].[dbo].[#sysobjs].[type] as [o ].[type]='U' AND [tempdb].[dbo].[#sysobjs].[name] as [o ].[name]=N'accounts'))
| |--Compute Scalar(DEFINE:([c ].[max_length]=[tempdb].[dbo].[#syscols].[max_length] as [c ].[max_length]))
| |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
|--Index Spool(SEEK:([o ].[name]=[tempdb].[dbo].[#sysobjs].[name] as [o ].[name] AND [c ].[column_id]=[tempdb].[dbo].[#syscols].[column_id] as [c ].[column_id]))
|--Compute Scalar(DEFINE:([Expr1022]=CASE WHEN [Expr1037]=(0) THEN NULL ELSE [Expr1038] END))
|--Stream Aggregate(DEFINE:([Expr1037]=Count(*), [Expr1038]=SUM([tempdb].[dbo].[#syscols].[max_length] as [c ].[max_length])))
|--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
|--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].[dbo].[#sysobjs].[name] as [o ].[name]=[tempdb].[dbo].[#sysobjs].[name] as [o ].[name] AND [tempdb].[dbo].[#sysobjs].[type] as [o ].[type]='U' AND [tempdb
|--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]), WHERE:([tempdb].[dbo].[#syscols].[column_id] as [c ].[column_id]>[tempdb].[dbo].[#syscols].[column_id] as [c ].[column_id]))
If you compare the above query plan to the first, the specific plan for just the anchor you can see the same ‘anchor’ query has pretty much been expanded inline into three parts of the query plan – take a look back at the query and note there are 3 references to the anchor!
The output below is a subset of the information available when you use SET STATISTICS PROFILE ON, I’ve not bothered posting the full STATS PROFILE instead I’ve truncated the output for readability – I’m more concerned with the Rows and Executes columns rather than the query tree.
Table 'Worktable'. Scan count 11, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#syscols'. Scan count 7, logical reads 35, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#sysobjs'. Scan count 7, logical reads 21, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
5 1 with ctePerf ( objname, colid, olid
0 0 |--Compute Scalar(DEFINE:([Expr1022]=[Expr1022]))
5 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([o ].[name], [c ].[column_id]))
0 0 |--Compute Scalar(DEFINE:([Expr1015]=CASE WHEN [Expr1035]=(0) THEN NUL
5 1 | |--Stream Aggregate(GROUP BY:([c ].[column_id], [c ].[name]) DEFINE
15 1 | |--Nested Loops(Left Outer Join, WHERE:([tempdb].[dbo].[#sys
5 1 | |--Sort(ORDER BY:([c ].[column_id] ASC, [c ].[name] ASC))
5 1 | | |--Hash Match(Inner Join, HASH:([o ].[object_id])=(
1 1 | | |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ])
453 1 | | |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
25 5 | |--Table Spool
5 1 | |--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
1 1 | |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]),
0 0 | |--Compute Scalar(DEFINE:([c ].[max_length]=[tempdb].[dbo].
453 1 | |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
5 5 |--Index Spool(SEEK:([o ].[name]=[tempdb].[dbo].[#sysobjs].[name] as [o ].[name] AND
0 0 |--Compute Scalar(DEFINE:([Expr1022]=CASE WHEN [Expr1037]=(0) THEN NULL ELSE [Expr1038] END))
5 5 |--Stream Aggregate(DEFINE:([Expr1037]=Count(*), [Expr1038]=SUM([tempdb].[dbo].[#syscols]
10 5 |--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
5 5 |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].[dbo].[#sysobjs]
1496 5 |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]), WHERE:([tempdb].[dbo].[#syscols]
Notice the number of rows returned for each of the highlighted steps – it’s running the complete query every time; the 1496 is higher because of what we are doing in the sub-query.
Let’s use a temporary table for the anchor instead; first pump the query into a temporary table then use that in the CTE...
select objname = o.name,
colid = c.column_id,
colname = c.name,
c.max_length
into #anchor
from #sysobjs o
inner join #syscols c on c.object_id=o.object_id
where o.type='U'
and o.name= 'accounts'
;with ctePerf ( objname, colid, colname, max_length )
as (
select objname, colid, colname, max_length
from #anchor
)
select p.objname, p.colname, p.colid,
size_sofar = SUM( p2.max_length ),
size_left = ( SELECT SUM( p3.max_length )
FROM ctePerf p3
WHERE p3.objname=p.objname
and p3.colid > p.colid
)
from ctePerf p
left outer join ctePerf p2 on p2.objname=p.objname
and p2.colid <=p.colid
group by p.objname, p.colname, p.colid
order by objname, colid
go
drop table #anchor
Execution plans...
select objname = o.name,
colid = c.column_id,
colname = c.name,
c.max_length
into #anchor
from #sysobjs o
inner join #syscols c on c.object_id=o.object_id
where o.type='U'
and o.name= 'accounts'
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#syscols'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#sysobjs'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
5 1 select objname = o.name, colid = c.column_id, colname = c.name,
5 1 |--Table Insert(OBJECT:([#anchor]), SET:([#anchor].[objname] = [tempdb].[d
5 1 |--Top(ROWCOUNT est 0)
5 1 |--Hash Match(Inner Join, HASH:([o ].[object_id])=([c ].[object_id]))
1 1 |--Table Scan(OBJECT:([tempdb].[dbo].[#sysobjs] AS [o ]), WHERE:([tempdb].
453 1 |--Table Scan(OBJECT:([tempdb].[dbo].[#syscols] AS [c ]))
;with ctePerf ( objname, colid, colname, max_length )
as (
select objname, colid, colname, max_length
from #anchor
)
select p.objname, p.colname, p.colid,
size_sofar = SUM( p2.max_length ),
size_left = ( SELECT SUM( p3.max_length )
FROM ctePerf p3
WHERE p3.objname=p.objname
and p3.colid > p.colid
)
from ctePerf p
left outer join ctePerf p2 on p2.objname=p.objname
and p2.colid <=p.colid
group by p.objname, p.colname, p.colid
order by objname, colid
Table '#anchor'. Scan count 7, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
5 1 with ctePerf ( objname, colid, colname, max_length
0 0 |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))
5 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[#a
0 0 |--Compute Scalar(DEFINE:([Expr1011]=CASE WHEN [Expr1029]=(
5 1 | |--Stream Aggregate(GROUP BY:([tempdb].[dbo].[#anchor]
15 1 | |--Nested Loops(Left Outer Join, OUTER REFERENCES
5 1 | |--Sort(ORDER BY:([tempdb].[dbo].[#anchor].[
5 1 | | |--Table Scan(OBJECT:([tempdb].[dbo].[#
0 0 | |--Compute Scalar(DEFINE:([tempdb].[dbo].[#a
15 5 | |--Table Scan(OBJECT:([tempdb].[dbo].[#
0 0 |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1031]=(
5 5 |--Stream Aggregate(DEFINE:([Expr1031]=Count(*), [Expr
10 5 |--Table Scan(OBJECT:([tempdb].[dbo].[#anchor]),
Comparing the results from the above you can clearly see that the IO is a lot less (19 logical reads using the # table against 112 just using the CTE) – that is also shown by the Rows column in the STATISTICS PROFILE output.
But looking at an example on more realistic volumes you can clearly see the difference. The example below is on a transaction table and gives running totals for a subset of accounts.
with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select row_no = id, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
from Transactions
where account_id between 1000 and 1020
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
Table 'transactions'. Scan count 1156, logical reads 54084, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
588 1 with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_i
0 0 |--Compute Scalar(DEFINE:([Expr1013]=CASE WHEN [Expr1006] IS NOT NULL THEN [Expr1011] ELS
588 1 |--Nested Loops(Inner Join, PASSTHRU:(IsFalseOrNull [Expr1006] IS NOT NULL), OUTER R
588 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo].[transact
588 1 | |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo].[tra
588 1 | | |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[ncidx_a
588 588 | | |--Clustered Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions
0 0 | |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1027]=(0) THEN NULL EL
588 588 | |--Stream Aggregate(DEFINE:([Expr1027]=Count(*), [Expr1028]=SUM([SQLB
8132 588 | |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].
8132 588 | |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions
8132 8132 | |--Clustered Index Seek(OBJECT:([SQLBits20080301].[dbo].[tr
0 0 |--Compute Scalar(DEFINE:([Expr1011]=CASE WHEN [Expr1029]=(0) THEN NULL ELSE [E
567 567 |--Stream Aggregate(DEFINE:([Expr1029]=Count(*), [Expr1030]=SUM([SQLBits20
8132 567 |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo]
8132 567 |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[nc
8132 8132 |--Clustered Index Seek(OBJECT:([SQLBits20080301].[dbo].[transact
Same query just the anchor is sourced from a temporary table...
select row_no = id, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
into #t
from Transactions
where account_id between 1000 and 1020;
create unique clustered index clidx on #t ( account_id, row_no )
Table 'transactions'. Scan count 1, logical reads 1813, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(588 row(s) affected)
Table '#t'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
588 1 select row_no = id, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
588 1 |--Table Insert(OBJECT:([#t]), SET:([#t].[row_no] = [SQLBits20080301].[dbo].[transactions].[id],[#t].
588 1 |--Top(ROWCOUNT est 0)
588 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo].[transactions].[id], [Exp
588 1 |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[ncidx_account_id]), SEEK:([S
588 588 |--Clustered Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[pk_transactions]),
;with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select *
from #t
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
go
Table '#t'. Scan count 1156, logical reads 2400, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Rows Executions
588 1 with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_
0 0 |--Compute Scalar(DEFINE:([Expr1013]=CASE WHEN [Expr1006] IS NOT NULL THEN [Expr1011] ELSE (0.00) EN
588 1 |--Nested Loops(Inner Join, PASSTHRU:(IsFalseOrNull [Expr1006] IS NOT NULL), OUTER REFERENCES:(
588 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[#t].[row_no], [tempdb].[dbo]
588 1 | |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#t]), ORDERED FORWARD)
0 0 | |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END))
588 588 | |--Stream Aggregate(DEFINE:([Expr1026]=Count(*), [Expr1027]=SUM([tempdb].[dbo].[
8132 588 | |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#t]), SEEK:([tempdb].[dbo].
0 0 |--Compute Scalar(DEFINE:([Expr1011]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
567 567 |--Stream Aggregate(DEFINE:([Expr1028]=Count(*), [Expr1029]=SUM([tempdb].[dbo].[#t].[
8132 567 |--Clustered Index Seek(OBJECT:([tempdb].[dbo].[#t]), SEEK:([tempdb].[dbo].[#t].
As you can see the pure CTE method costs 54,084 logical reads whereas using the # table method costs just 4,218 and look at the differences in the Rows and Executions between the CTE and the # with CTE method, I do actually use the ROW_NUMBER function in my real query example but that highlights yet another performance problem but I’ll save that for another post.
There is another problem here as well that will work against you in the performance stakes – statistics! The CTE is expanding the anchor SQL into the other bits of the query like an inline macro and as such SQL Server doesn’t have the statistics on the subset of rows the anchor produces. By using the temporary you’ve sucked the results out and have the ability to create an index(s) on that # table which the optimiser will then use when it recompiles the CTE statement.
I think my only advice here is – unless your application has a high concurrent number of users and you are suffering COMPILE locks because of recompiles then don’t use Common Table Expressions unless you need recursion; the problem described above does not exist on the CTE when you use recursion because of the way recursion works. If you are simply using the # table in the anchor as I describe above then don’t use the CTE at all – just use the # table in the first place!
Filed under: SQL Server, SQL Development, CTE