SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL

June 2011 - Posts

Non use of persisted data – Part deux

In my last blog I showed how persisted data may not be used if you have used the base data on an include on an index.

That wasn't the only problem ive had that showed the same symptom.  Using the same code as before,  I was executing similar to the below :

select BillToAddressID,SOD.SalesOrderDetailID,SOH.CleanedGuid 
from sales.salesorderheader SOH
join Sales.SalesOrderDetail SOD
on SOH.SalesOrderID = SOD.SalesOrderID

But,  due to a distribution error in statistics i found it necessary to use a table hint.  In this case, I wanted to force a loop join

select BillToAddressID,SOD.SalesOrderDetailID,SOH.CleanedGuid 
from sales.salesorderheader SOH
inner loop join Sales.SalesOrderDetail SOD
on SOH.SalesOrderID = SOD.SalesOrderID
 
But, being the diligent  TSQL developer that I am ,looking at the execution plan I noticed that the ‘compute scalar’ operator was again calling the function.  Again,  profiler is a more graphic way to view this…..
 

image

All very odd,  just because ive forced a join , that has NOTHING, to do with my persisted data then something is causing the data to be re-evaluated.

Not sure if there is any easy fix you can do to the TSQL here, but again its a lesson learned (or rather reinforced) examine the execution plan of every query you write to ensure that it is operating as you thought it would.

Itzik Ben-Gan is in town

Not that you would know it from the page below,  but Itzik Ben-Gan is back in London to do a 5 day training course, start 03october. 

http://www.qa.com/training-courses/technical-it-training/microsoft/microsoft-sql-server/microsoft-sql-server-2008-and-r2/advanced-t-sql-querying,-programming-and-tuning-for-sql-server-2005--2008

Why QA are not screaming this from the rafters, I will never be able to fathom.  Its kind of like going for a physics course and finding that Steven Hawking is taking the class. Training budgets are tight at the moment and £2500+ is a fair amount to pay but ,as the saying goes,  but if you pay peanuts you get monkeys.

 Looks like you will need to be quick , the site is saying "Fewer than 5 places available".

Non use of persisted data

Working at a client site, that in itself is good to say, I ran into a set of circumstances that made me ponder, and appreciate, the optimizer engine a bit more.

Working on optimizing a stored procedure, I found a piece of code similar to :

select BillToAddressID,
Rowguid,
dbo.udfCleanGuid(rowguid)
from sales.salesorderheader
where BillToAddressID = 985

A lovely scalar UDF was being used,  in actuality it was used as part of the WHERE clause but simplified here.  Normally I would use an inline table valued function here, but in this case it wasn't a good option.

So this seemed like a pretty good case to use a persisted column to improve performance.

The supporting index was already defined as

create index idxBill 
on sales.salesorderheader(BillToAddressID)
include (rowguid)

and the function code is

Create Function udfCleanGuid(@GUID uniqueidentifier)
returns varchar(255)
with schemabinding
as
begin
Declare @RetStr varchar(255)
Select @RetStr=CAST(@Guid as varchar(255))
Select @RetStr=REPLACE(@Retstr,'-','')
return @RetStr
end

Executing the Select statement produced a plan of :

Seek1

Nothing surprising, a seek to find the data and compute scalar to execute the UDF.

Lets get optimizing and remove the UDF with a persisted column

Alter table sales.salesorderheader
add CleanedGuid as dbo.udfCleanGuid(rowguid)
PERSISTED

A subtle change to the SELECT statement…

select BillToAddressID,CleanedGuid 
from sales.salesorderheader
where BillToAddressID = 985

and our new optimized plan looks like…

Seek2

Not a lot different from before!  We are using persisted data on our table, where is the lookup to fetch it ?  It didnt happen,  it was recalculated.  Looking at the properties of the relevant Compute Scalar would confirm this ,  but a more graphic example would be shown in the profiler SP:StatementCompleted event.

image

Why did the lookup happen ? Remember the index definition,  it has included the original guid to avoid the lookup.  The optimizer knows this column will be passed into the UDF, run through its logic and decided that to recalculate is cheaper than the lookup.  That may or may not be the case in actuality,  the optimizer has no idea of the real cost of a scalar udf.  IMO the default cost of a scalar UDF should be seen as a lot higher than it is, since they are invariably higher.

Knowing this, how do we avoid the function call?  Dropping the guid from the index is not an option, there may be other code reliant on it.   We are left with only one real option,  add the persisted column into the index.

drop index Sales.SalesOrderHeader.idxBill
go
create index idxBill on sales.salesorderheader(BillToAddressID)
include (rowguid,cleanedguid)

Now if we repeat the statement
select BillToAddressID,CleanedGuid 
from sales.salesorderheader
where BillToAddressID = 985

image

We still have a compute scalar operator, but this time it wasnt used to recalculate the persisted data.  This can be confirmed with profiler again.

image

The takeaway here is,  just because you have persisted data dont automatically assumed that it is being used.

Column order can matter

Ordinarily, column order of a SQL statement does not matter.

Select a,b,c 
from table

will produce the same execution plan as
 
Select c,b,a
from table

However, sometimes it can make a difference.
 
Consider this statement (maxdop is used to make a simpler plan and has no impact to the main point):
 

select SalesOrderID,
CustomerID,
OrderDate,
ROW_NUMBER() over (Partition By CustomerId order by OrderDate asc) as RownAsc,
ROW_NUMBER() over (Partition By CustomerId order by OrderDate Desc) as RownDesc
from sales.SalesOrderHeader
order by CustomerID,OrderDate
option(maxdop 1)

If you look at the execution plan, you will see similar to this

3Sorts

That is three sorts.  One for RownAsc,  one for RownDesc and the final one for the ‘Order by’ clause.  Sorting is an expensive operation and one that should be avoided if possible.  So with this in mind, it may come as some surprise that the optimizer does not re-order operations to group them together when the incoming data is in a similar (if not exactly the same) sorted sequence. 

A simple change to swap the RownAsc and RownDesc columns to produce this statement :

select SalesOrderID,
CustomerID,
OrderDate,
ROW_NUMBER() over (Partition By CustomerId order by OrderDate Desc) as RownDesc ,
ROW_NUMBER() over (Partition By CustomerId order by OrderDate asc) as RownAsc
from Sales.SalesOrderHeader
order by CustomerID,OrderDate
option(maxdop 1)

Will result a different and more efficient query plan with one less sort.

twosort

The optimizer, although unable to automatically re-order operations, HAS taken advantage of the data ordering if it is as required.  This is well worth taking advantage of if you have different sorting requirements in one statement. Try grouping the functions that require the same order together and save yourself a few extra sorts.

Kent .Net/SqlServer User Group – Upcoming events

KentUG

At the Kent user group we have two upcoming events.  Both are to be held at F-Keys Training suite http://f-keys.co.uk/ in Rochester, Kent.

If you haven’t attended before please note the location here.


14-June

Is your code S.O.L.I.D ?

Nathan Gloyn

Everybody keeps on about SOLID principles but what are they? and why should you care?
This session is an introduction to SOLID and I'll aim to walk through each principle telling you about that principle and then show how a code base can be refactored using the principles to make your life easier,
Come the end of the session you should have a basic understanding of the principle, why to use it and how using it can improve your code.

Building composite applications with OpenRasta 3

Sebastien Lambla

A wave of change is coming to Web development on .NET. Packaging technologies are bringing dependency management to .NET for the first time, streamlining development workflow and creating new possibilities for deployment and administration. The sky's the limit, and in this session we'll explore how open frameworks can help us leverage composition for the web.

Register here for this event http://www.eventbrite.com/event/1643797643


05-July

Tony Rogerson

Achieving a throughput of 1.5Terabytes or over 92,000 8Kbyte of 100% random reads per second on kit costing less that 2.5K, and of course what to do with it!

The session will focus on commodity kit and how it can be used within business to provide massive performance benefits at little cost.

End to End Report Creation and Management using SQL Server Reporting Services  

Chris Testa-O'Neill

This session will walk through the authoring, management and delivery of reports with a focus on the new features of Reporting Services 2008 R2. At the end of this session you will understand how to create a report in the new report designer. Be aware of the Report management options available and the delivery mechanisms that can be used to deliver reports.

Register here for this event http://www.eventbrite.com/event/1643805667


Hope to see you at one or other ( or even both if you are that way inclined).