The underused OVER clause!

Published 23 January 11 01:33 PM | MartinBell

Seeing the Steve Hindmarsh’s blog post on the OUTPUT clause got me thinking about what other underused clauses may benefit from greater exposure!

You may already be familiar with the OVER clause because it is mandatory when using a ranking function i.e. ROW_NUMBER, RANK, DENSE_RANK and NTILE. The order I’ve listed these functions is probably the same order as their popularity.

The syntax for the OVER clause when used with a ranking function is as follows:

< OVER_CLAUSE > :: =
    OVER ( [ PARTITION BY value_expression , ... [ n ] ]
           <ORDER BY_Clause> )

The reason I think it is under-used is because you can also use an OVER clause with aggregate function. The format of the OVER function in this case the syntax for the OVER clause in this case is:

< OVER_CLAUSE > :: =
    OVER ( [ PARTITION BY value_expression , ... [ n ] ]
)

The example in books online shows an example use of what this can be used for:

SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN(43659,43664)
ORDER BY SalesOrderID, ProductID, OrderQty ;
GO

This is showing the aggregate value for each sales order against each sales order detail line.

If you were doing this in a version before SQL 2005 then you would need to use a sub-query.

SELECT M.SalesOrderID, M.ProductID, M.OrderQty
    ,( SELECT SUM(OrderQty) FROM Sales.SalesOrderDetail S WHERE S.SalesOrderID = M.SalesOrderID ) AS 'Total'
    ,( SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail S WHERE S.SalesOrderID = M.SalesOrderID ) AS 'Avg'
    ,( SELECT COUNT(OrderQty) FROM Sales.SalesOrderDetail S WHERE S.SalesOrderID = M.SalesOrderID ) AS 'Count'
    ,( SELECT MIN(OrderQty) FROM Sales.SalesOrderDetail S WHERE S.SalesOrderID = M.SalesOrderID ) AS 'Min'
    ,( SELECT MAX(OrderQty) FROM Sales.SalesOrderDetail S WHERE S.SalesOrderID = M.SalesOrderID ) AS 'Max'
FROM Sales.SalesOrderDetail M
WHERE M.SalesOrderID IN(43659,43664)
ORDER  BY M.SalesOrderID, M.ProductID, M.OrderQty ;
GO

A more efficient way to do this would be:

SELECT M.SalesOrderID, M.ProductID, M.OrderQty
    ,S.[Total], S.[Avg], S.[Count], S.[Min], S.[Max]
FROM Sales.SalesOrderDetail M
JOIN (
        SELECT SalesOrderID
                , SUM(OrderQty) AS 'Total'
                , AVG(OrderQty) AS 'Avg'
                , COUNT(OrderQty) AS 'Count'
                , MIN(OrderQty) AS 'Min'
                , MAX(OrderQty) AS 'Max'
        FROM Sales.SalesOrderDetail
        WHERE SalesOrderID IN(43659,43664)
        GROUP BY SalesOrderID
    ) S ON S.SalesOrderID = m.SalesOrderID
WHERE M.SalesOrderID IN(43659,43664)
ORDER  BY M.SalesOrderID, M.ProductID, M.OrderQty ;
GO

If you look at this with STATISTIC IO set the least number of scans and logical reads are produced using the OVER clause, but the number of logical reads for the derived table is only a few more and the number of scans is the same (and as they seem to have the same query execution plan, under-the-hood they must translate to the same thing!).

If you are using sub-queries then switching to using the OVER clause could be very beneficial.

If doing this sort of thing has proven to be a performance bottleneck, traditionally the solution has been to de-normalise and store the totals against the master record and a mechanism to maintain them has to be implemented. If this is done using triggers then the overhead of doing this could impact the time taken to complete a transaction. Although it is reasonably easy to maintain a total using this method it is more difficult to calculate the other aggregates. If you are in this situation revisiting the OVER statement may be worthwhile.

Itzik Ben-Gan and Sujata Mehta have suggested several enhancements to T-SQL including changes to the Over clause, this is documented on connect and at http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc

Filed under: ,

Comments

# Dew Drop – January 24, 2011 | Alvin Ashcraft's Morning Dew said on January 24, 2011 02:05 PM:

Pingback from  Dew Drop &ndash; January 24, 2011 | Alvin Ashcraft&#039;s Morning Dew

This Blog

SQL Blogs

Syndication