September 2009 - Posts
This is a question that cropped up SQLServerCentral. Imagine you have a list of dates, for which you need to know the next day after a specified interval. |
So in this list you want to return those with an interval of 3Days (ie those in Red)
|
| 01Jan |
| 02Jan |
| 07jan |
| 11Jan |
| 12Jan |
| 28jan |
| |
| To add further complication there are also multiple ranges. Heres some sample data |
drop table #DateSteps
go
Create table #DateSteps
(
Handler integer,
EventDate smalldatetime
)
go
Create unique index idxDateSteps on #DateSteps(handler,EventDate)
go
insert into #dateSteps values(1,'01jan1990')
insert into #dateSteps values(1,'07jan1990')
insert into #dateSteps values(1,'11jan1990')
insert into #dateSteps values(1,'12jan1990')
insert into #dateSteps values(1,'28jan1990')
insert into #dateSteps values(1,'08mar1990')
insert into #dateSteps values(2,'02jan1990')
insert into #dateSteps values(2,'04jan1990')
insert into #dateSteps values(2,'10jan1990')
insert into #dateSteps values(2,'12jan1990')
insert into #dateSteps values(2,'15mar1990')
| The key to this routine is the use of row_number() within the derived table at the recursive element of the CTE. This enables us to return only the next row that is three days or more after the input date. |
with cteStartDates(Handler,EventDate)
as
(
Select Handler,
min(EventDate)
from #datesteps
group by Handler
),
CteRecursiveSkip(Handler,EventDate)
as
(
Select Handler,EventDate
from cteStartDates
union all
Select x.Handler,x.EventDate
from (
Select #datesteps.Handler,#datesteps.EventDate,
RowNum = row_number()
over (partition by #datesteps.Handler
order by #datesteps.EventDate)
from CteRecursiveSkip
join #datesteps
on #datesteps.Handler = CteRecursiveSkip.Handler
and #datesteps.Eventdate >= CteRecursiveSkip.EventDate+3
) as x
where x.RowNum = 1
)
Select *
from CteRecursiveSkip
order by Handler,EventDate
| Yet another cursor resigned to the garbage can. Performance should also be quite good across large datasets. |
Heres a little something that Simon Sabin mentioned at the London PASS user group last night that I wasnt aware of.
Ive lost count of the amount of queries I have that are some derivative of >= Midnight of a day and < Midnight of day +1. In SQLServer 2008 the engine now does the work for you.
select *
from Sales.SalesOrderHeader
where CAST(OrderDate as DATE) ='20010701'
No great surprises so far, but internally the engine has transformed it into a BETWEEN query. So, if you add an index
create index idxSalesOrderHeaderDate
on Sales.SalesOrderHeader(OrderDate)
include (SalesOrderId)
and then look at the query plan …
Fabulous, an index seek. How simple is that ?
In SQL2005 (9.00.4207.00), if you use a ranking function within a view or CTE, then an inefficient query plan can be produced.
First off in Adventure works create this index
Create Index idxLastName on Person.Contact(LastName)
and then consider these similar queries:
select ContactId,
LastName,
row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
where LastName = 'Smith'
go
create view vwContacts
as
select ContactId,
LastName,
rown = row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
go
Select * from vwContacts
where LastName = 'Smith'
go
You should now see two vastly different query plans

Notice in “Query 2” that Filter ?
Wow.. So it did calculate the entire query and then filter on the results. At first i thought that i was being unfair, obviously there is potential for filtering on the ranking function, but when doing exactly the same on 2008 I get index seeks on both sides
CTE’s also suffer from the same problem, although they are easier to rewrite than a view to utilize a seek.
with cteLastNameRow(ContactId,LastName,Row_number)
as(
select ContactId,
LastName,
row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
)
select *
from cteLastNameRow
where LastName = 'Smith'
go
Another index scan.
So , be careful when using ranking functions and as ever, investigate the query execution plan on EVERY query. I've raised a connect bug report here if you feel that it is something that Microsoft should invest some time in fixing.