17 May 2008 16:45
tonyrogerson
Non Recursive Common Table Expressions - Performance Sucks 2 - ROW_NUMBER() is executed {number of CTE references} x {number of rows from the anchor}
Using the ROW_NUMBER() function in a non recursive CTE gives a very big performance degradation because the Sequence Generation is executed not once as you'd expect but for once for every row in the anchor starting results, example - if the anchor query returns 588 rows to the CTE then the Sequence Generator will be executed for every reference of the CTE multiplied by 588!
Check this query:
with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )
as (
select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id
from Transactions
where account_id between 1000 and 1020
)
select t1.*,
running_amount = coalesce(
( select sum( amount )
from Trans t2
where t2.row_no < t1.row_no
and t2.account_id = t1.account_id ), 0 )
+ amount
from Trans t1
order by account_id, row_no
go
The results of SET STATISTICS PROFILE ON is as follows:
|
Rows |
Execs |
|
|
588 |
1 |
with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id ) as ( select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id from Transactions where account_id between 1000 and 1020 ) select t1.*, running_amount = coalesce( ( select sum( amount ) from Trans t2 where t2.row_no < t1.row_no and t2.account_id = t1.account_id ), 0 ) + amount from Trans t1 order by account_id, row_no |
|
0 |
0 |
|--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1008] IS NOT NULL THEN [Expr1014] ELSE (0.00) END+[SQLBits20080301].[dbo].[transactions].[amount])) |
|
588 |
1 |
|--Nested Loops(Inner Join, PASSTHRU:(IsFalseOrNull [Expr1008] IS NOT NULL), OUTER REFERENCES:([SQLBits20080301].[dbo].[transactions].[account_id], [Expr1003])) |
|
588 |
1 |
|--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo].[transactions].[account_id], [Expr1003])) |
|
588 |
1 |
| |--Sequence Project(DEFINE:([Expr1003]=row_number)) |
|
0 |
0 |
| | |--Compute Scalar(DEFINE:([Expr1033]=(1))) |
|
588 |
1 |
| | |--Segment |
|
588 |
1 |
| | |--Sort(ORDER BY:([SQLBits20080301].[dbo].[transactions].[account_id] ASC, [SQLBits20080301].[dbo].[transactions].[tran_date] ASC, [SQLBits20080301].[dbo].[transactions].[id] ASC)) |
|
588 |
1 |
| | |--Nested Loops(Inner Join, OUTER REFERENCES:([SQLBits20080301].[dbo].[transactions].[id], [Expr1031]) WITH UNORDERED PREFETCH) |
|
588 |
1 |
| | |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[ncidx_account_id]), SEEK:([SQLBits20080301].[dbo].[transactions].[account_id] >= (1000) AND [SQLBits20080301].[dbo].[transactions].[account_id] <= (1020)) ORDERED FORWARD) |
|
588 |
588 |
| | |--Clustered Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[pk_transactions]), SEEK:([SQLBits20080301].[dbo].[transactions].[id]=[SQLBits20080301].[dbo].[transactions].[id]) LOOKUP ORDERED FORWARD) |
|
588 |
588 |
| |--Index Spool(SEEK:([SQLBits20080301].[dbo].[transactions].[account_id]=[SQLBits20080301].[dbo].[transactions].[account_id] AND [Expr1003]=[Expr1003])) |
|
0 |
0 |
| |--Compute Scalar(DEFINE:([Expr1008]=CASE WHEN [Expr1036]=(0) THEN NULL ELSE [Expr1037] END)) |
|
588 |
588 |
| |--Stream Aggregate(DEFINE:([Expr1036]=Count(*), [Expr1037]=SUM([SQLBits20080301].[dbo].[transactions].[amount]))) |
|
8132 |
588 |
| |--Filter(WHERE:([Expr1007]<[Expr1003] AND [SQLBits20080301].[dbo].[transactions].[account_id]=[SQLBits20080301].[dbo].[transactions].[account_id])) |
|
173166 |
588 |
| |--Top(TOP EXPRESSION:(CASE WHEN [Expr1003] IS NULL OR [Expr1003]<(0) THEN (0) ELSE [Expr1003] END)) |
|
173166 |
588 |
| |--Sequence Project(DEFINE:([Expr1007]=row_number)) |
|
0 |
0 |
| |--Compute Scalar(DEFINE:([Expr1035]=(1))) |
|
173166 |
588 |
| |--Segment |
|
173166 |
588 |
| |--Sort(ORDER BY:([SQLBits20080301].[dbo].[transactions].[account_id] ASC, [SQLBits20080301].[dbo].[transactions].[tran_date] ASC, [SQLBits20080301].[dbo].[transactions].[id] ASC)) |
|
345744 |
588 |
| |--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[ncidx]), SEEK:([SQLBits20080301].[dbo].[transactions].[account_id] >= (1000) AND [SQLBits20080301].[dbo].[transactions].[account_id] <= (1020)) ORDERED FORWARD) |
|
567 |
567 |
|--Index Spool(SEEK:([SQLBits20080301].[dbo].[transactions].[account_id]=[SQLBits20080301].[dbo].[transactions].[account_id] AND [Expr1003]=[Expr1003])) |
|
0 |
0 |
|--Compute Scalar(DEFINE:([Expr1014]=CASE WHEN [Expr1040]=(0) THEN NULL ELSE [Expr1041] END)) |
|
567 |
567 |
|--Stream Aggregate(DEFINE:([Expr1040]=Count(*), [Expr1041]=SUM([SQLBits20080301].[dbo].[transactions].[amount]))) |
|
8132 |
567 |
|--Filter(WHERE:([Expr1013]<[Expr1003] AND [SQLBits20080301].[dbo].[transactions].[account_id]=[SQLBits20080301].[dbo].[transactions].[account_id])) |
|
167250 |
567 |
|--Top(TOP EXPRESSION:(CASE WHEN [Expr1003] IS NULL OR [Expr1003]<(0) THEN (0) ELSE [Expr1003] END)) |
|
167250 |
567 |
|--Sequence Project(DEFINE:([Expr1013]=row_number)) |
|
0 |
0 |
|--Compute Scalar(DEFINE:([Expr1039]=(1))) |
|
167250 |
567 |
|--Segment |
|
167250 |
567 |
|--Sort(ORDER BY:([SQLBits20080301].[dbo].[transactions].[account_id] ASC, [SQLBits20080301].[dbo].[transactions].[tran_date] ASC, [SQLBits20080301].[dbo].[transactions].[id] ASC)) |
|
333396 |
567 |
|--Index Seek(OBJECT:([SQLBits20080301].[dbo].[transactions].[ncidx]), SEEK:([SQLBits20080301].[dbo].[transactions].[account_id] >= (1000) AND [SQLBits20080301].[dbo].[transactions].[account_id] <= (1020)) ORDERED FORWARD) |
The CTE should return just 588 rows because that’s what where account_id between 1000 and 1020 yields.
However, look at the number of rows processed in the above STATISTICS PROFILE output; 333,396 rows – ouch! If you divide 333,396 by 567 and 345744 by 588 you will get the result 588 which is our target result; so – on every execution the ROW_NUMBER is being recalculated on the 588 rows instead of the Sequence Generator just being produced once and reused, this is also confirmed if you look at the “Sequence Project(DEFINE:” step above which is executed 588 times.
When it boils down to it the actual problem is the fact that the CTE anchor is just expanded inline into the main query which is bad as I’ve demonstrated in my previous entry on performance of CTE’s.
The problem does not occur when using a Recursive Member.
Recommendation: Using the # table route as I suggested before.
Filed under: SQL Server, SQL Development, CTE