SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

September 2009 - Posts

Returning a chain of events with a recursive CTE
This is a question that cropped up SQLServerCentral.
Imagine you have a list of dates, for which you need to know the next day after a specified interval. 

So in this list you want to return those with an interval of 3Days (ie those in Red)
01Jan
02Jan
07jan
11Jan
12Jan
28jan
 
To add further complication there are also multiple ranges.  Heres some sample data
drop table #DateSteps
go
Create table
#DateSteps
(
Handler integer,
EventDate smalldatetime
)
go
Create unique index
idxDateSteps on #DateSteps(handler,EventDate)
go
insert into
#dateSteps values(1,'01jan1990')
insert into #dateSteps values(1,'07jan1990')
insert into #dateSteps values(1,'11jan1990')
insert into #dateSteps values(1,'12jan1990')
insert into #dateSteps values(1,'28jan1990')
insert into #dateSteps values(1,'08mar1990')
insert into #dateSteps values(2,'02jan1990')
insert into #dateSteps values(2,'04jan1990')
insert into #dateSteps values(2,'10jan1990')
insert into #dateSteps values(2,'12jan1990')
insert into #dateSteps values(2,'15mar1990')

The key to this routine is the use of row_number() within the derived table at the recursive element of the CTE.  This enables us to return only the next row that is three days or more after the input date.
with cteStartDates(Handler,EventDate)
as
(
Select Handler,
min(EventDate)
from #datesteps
group by Handler
),
CteRecursiveSkip(Handler,EventDate)
as
(
Select Handler,EventDate
from cteStartDates
union all
Select x.Handler,x.EventDate
from (
Select #datesteps.Handler,#datesteps.EventDate,
RowNum = row_number()
over (partition by #datesteps.Handler
order by #datesteps.EventDate)
from CteRecursiveSkip
join #datesteps
on #datesteps.Handler = CteRecursiveSkip.Handler
and #datesteps.Eventdate >= CteRecursiveSkip.EventDate+3
) as x
where x.RowNum = 1
)
Select *
from CteRecursiveSkip
order by Handler,EventDate
Yet another cursor resigned to the garbage can.  Performance should also be quite good across large datasets.
DateTime Lookups

Heres a little something that Simon Sabin mentioned at the London PASS user group last night that I wasnt aware of.

Ive lost count of the amount of queries I have that are some derivative of  >= Midnight of a day and < Midnight of day +1. In SQLServer 2008 the engine now does the work for you.

select *
from Sales.SalesOrderHeader
where CAST(OrderDate as DATE) ='20010701'

No great surprises so far, but internally the engine has transformed it into a BETWEEN query. So, if you add an index

create index idxSalesOrderHeaderDate 
on Sales.SalesOrderHeader(OrderDate)
include (SalesOrderId)

and then look at the query plan …

sohidx

Fabulous, an index seek. How simple is that ?

BUG - Use of ranking functions result in an inefficient query plan

In SQL2005 (9.00.4207.00), if you use a ranking function within a view or CTE, then an inefficient query plan can be produced.

First off in Adventure works create this index

Create Index idxLastName on Person.Contact(LastName)

and then consider these similar queries:

select ContactId,
LastName,
row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
where LastName = 'Smith'
go
create view vwContacts
as
select
ContactId,
LastName,
rown = row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
go
Select
* from vwContacts
where LastName = 'Smith'
go

You should now see two vastly different query plans


rankingqueryplan1


Notice in “Query 2” that Filter ?


image


Wow.. So it did calculate the entire query and then filter on the results.  At first i thought that i was being unfair, obviously there is potential for filtering on the ranking function, but when doing exactly the same on 2008 I get index seeks on both sides


rankingqueryplan2


CTE’s also suffer from the same problem, although they are easier to rewrite than a view to utilize a seek.

with cteLastNameRow(ContactId,LastName,Row_number)

as(
   
select ContactId,
          
LastName,
          
row_number() over(partition by LastName order by ContactId)
   
from [AdventureWorks].[Person].[Contact]
    )
select *
 
from cteLastNameRow
where LastName = 'Smith'
go

Another index scan.

So , be careful when using ranking functions and as ever,  investigate the query execution plan on EVERY query. I've raised a connect bug report here if you feel that it is something that Microsoft should invest some time in fixing.