The optimiser should understand that year(datecolumn) is correlated to the datecolumn

If you have a query such as

select year(OrderDate) yr, count(1)

from AdventureWorks2008.Sales.SalesOrderHeader

group by year(OrderDate)

and you have an index on OrderDate you will find that even though the query optimiser could do a stream aggregate on the index because is sorted by date it doesn’t instead it does a hash aggregate.

image

What you would like is this,

image

Unfortunately to achieve this you have to have computed columns for year, month, day and possibly time and then index that (to provide total fidelity with the ideal solution).

alter table Sales.SalesOrderHeader add [OrderYear] as Year(OrderDate)

alter table Sales.SalesOrderHeader add [OrderMonth]as Month(OrderDate)

alter table Sales.SalesOrderHeader add [OrderDay]  as Day(OrderDate)

alter table Sales.SalesOrderHeader add [OrderTime] as cast(OrderDate as time)

This doesn’t perform because the index is much wider due to the year, month and day functions all returning integers, this means that your index key is 16 bytes compared to the only 8 for a normal datetime column. You might say well why not cast the columns to tinyint and smallint. Well while that reduces the size of the index, it then stops the optimiser from matching the index to the query.

This is not only about getting streaming aggregates its also about being able to filter based on years and years and months. i.e. a query such as

select OrderDate

from AdventureWorks2008.Sales.SalesOrderHeader

where Year(OrderDate) = 2001

should be able to have the appropriate seek predicate on the OrderDate index, rather than having to just have a predicate. Combining the two together one should be able to do

with summary as

  (

    select year(OrderDate) yr, count(1) cnt

      from AdventureWorks2008.Sales.SalesOrderHeader

  group by year(OrderDate)

  ) 

select *

  from summary

 where yr between  2001 and 2002

And have the optimiser perform a seek on the index and then a streaming aggregate.

I’ve created a suggestion for this. If you find you have queries like this that aggregate based on year or year and month then this suggestion will help you so please vote for it.

https://connect.microsoft.com/SQLServer/feedback/details/653206/the-optimizer-should-be-able-to-understand-the-invertability-of-built-in-intrinsics-on-dates-for-the-purposes-of-index-sargability

Published 24 March 2011 00:28 by simonsabin

Comments

24 March 2011 01:14 by Rob Farley

# Function Invertability for SARGability

My good friend Simon Sabin used the term ‘invertability’ on a Connect item he logged today. Essentially,

24 March 2011 01:32 by SimonS Blog on SQL Server Stuff

# The optimiser should understand that year(datecolumn) is correlated to the datecolumn

If you have a query such as select year ( OrderDate ) yr , count ( 1 ) from AdventureWorks2008 . Sales

# @simon_sabin posts The optimiser should understand that year(datecolumn) is correlated to the datecolumn | sqlmashup

Pingback from  @simon_sabin posts The optimiser should understand that year(datecolumn) is correlated to the datecolumn | sqlmashup