SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL
“Query cost (relative to the batch)” <> Query cost relative to batch

OK, so that is quite a contradictory title, but unfortunately it is true that a common misconception is that the query with the highest percentage relative to batch is the worst performing.  Simply put, it is a lie, or more accurately we dont understand what these figures mean.

Consider the two below simple queries:

SELECT * FROM Person.BusinessEntity
JOIN Person.BusinessEntityAddress
ON Person.BusinessEntity.BusinessEntityID = Person.BusinessEntityAddress.BusinessEntityID
go
SELECT * FROM Sales.SalesOrderDetail
JOIN Sales.SalesOrderHeader
ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID

After executing these and looking at the plans, I see this :

image

So, a 13% / 87% split ,  but 13% / 87% of WHAT ? CPU ? Duration ? Reads ? Writes ? or some magical weighted algorithm ? 

In a Profiler trace of the two we can find the metrics we are interested in.

image

CPU and duration are well out but what about reads (210 and 1935)? To save you doing the maths, though you are more than welcome to, that’s a 90.2% / 9.8% split.  Close, but no cigar.

Lets try a different tact.  Looking at the execution plan the “Estimated Subtree cost” of query 1 is 0.29449 and query 2 its 1.96596.  Again to save you the maths that works out to 13.03% and 86.97%, round those and thats the figures we are after.  But, what is the worrying word there ? “Estimated”. 

So these are not “actual”  execution costs,  but what’s the problem in comparing the estimated costs to derive a meaning of “Most Costly”.  Well, in the case of simple queries such as the above , probably not a lot.  In more complicated queries , a fair bit.

By modifying the second query to also show the total number of lines on each order

SELECT *,COUNT(*) OVER (PARTITION BY Sales.SalesOrderDetail.SalesOrderID)
 FROM Sales.SalesOrderDetail
JOIN Sales.SalesOrderHeader
ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID
The split in percentages is now 6% / 94% and the profiler metrics are :
image

Even more of a discrepancy.

Estimates can be out with actuals for a whole host of reasons,  scalar UDF’s are a particular bug bear of mine and in-fact the cost of a udf call is entirely hidden inside the execution plan.  It always estimates to 0 (well, a very small number).

Take for instance the following udf

Create Function dbo.udfSumSalesForCustomer(@CustomerId integer)
returns money
as
begin
   Declare @Sum money
   Select @Sum= SUM(SalesOrderHeader.TotalDue)
     from Sales.SalesOrderHeader
    where CustomerID = @CustomerId
   return @Sum
end
If we have two statements , one that fires the udf and another that doesn't:
Select CustomerID
  from Sales.Customer
 order by CustomerID
go
Select CustomerID,dbo.udfSumSalesForCustomer(Customer.CustomerID)
  from Sales.Customer
 order by CustomerID
The costs relative to batch is a 50/50 split, but the has to be an actual cost of firing the udf.  Indeed profiler shows us :
image

No where even remotely near 50/50!!!!

Moving forward to window framing functionality in SQL Server 2012 the optimizer sees ROWS and RANGE ( see here for their functional differences) as the same ‘cost’ too

SELECT SalesOrderDetailID,SalesOrderId,
       SUM(LineTotal) OVER(PARTITION BY salesorderid 
         ORDER BY Salesorderdetailid RANGE unbounded preceding)
from Sales.SalesOrderdetail
go
SELECT SalesOrderDetailID,SalesOrderId,
       SUM(LineTotal) OVER(PARTITION BY salesorderid 
       ORDER BY Salesorderdetailid Rows unbounded preceding)
from Sales.SalesOrderdetail
By now it wont be a great display to show you the Profiler trace reads a *tiny* bit different.
image

So moral of the story, Percentage relative to batch can give a rough ‘finger in the air’ measurement, but dont rely on it as fact.

Offset without OFFSET

A while ago Robert Cary posted an article on SQL Server Central entitled 2005 Paging – The Holy Grail which is, as the title would suggest about paging in SQL Server.  This article provoked some really interesting chat around the subject and is well worth a read.

This is now a lot easier in SQL Server 2012 with the introduction of the OFFSET extension to the ORDER BY clause,  but what is the most optimal method is you are not using 2012 ?

Well, whilst playing around by the OFFSET portion of my “What’s new in SQL Server 2012 – TSQL” presentation, I hit on a different method that I’ve not seen published before.

Now whilst finding which rows are on which page is a problem, it is only part of a much wider problem, that being that cost of the lookups to find other related data.  For example:  You have a list of people which you are paging through in the order of LastName,  but you also wish to display FirstName.  That is not in your index and so a key lookup occurs, OK I could INCLUDE it in the index but im just simplifying the problem.

So, to demonstrate this I need to create an index on Person.Person in AdventureWorks.

Create index idxLastName on Person.Person(LastName)
The query for the “holy grail” method would look something like this :
with ctePaging
as
(
Select LastName,FirstName,
       row_number() over (order by LastName,BusinessEntityID)-1  as RowN
 from  Person.Person
)
Select * from ctePaging
where RowN between 20 and 39 
order by RowN;

The issue here is that SQL Server has initiated an index scan (against a different index than the one we created) and had to process all the rows in the table and then sort them. 

image

We only want 20 rows returned so this is quite a lot of wasted effort on the engine's part.

OFFSET has been introduced in 2012 and running the equivalent query of :

Select LastName,FirstName,BusinessEntityID
 from  Person.Person
 order by LastName,BusinessEntityID
 offset 20 rows fetch next 20 rows only;

Gives us the query plan of :

image

Even this is non-optimal though, as the key lookup has occurred 40 times , even though we only needed the data (in this case FirstName) for 20 rows.

This can be resolved by doing the key lookup yourself.

with cteKeySeek
as
(
Select LastName,BusinessEntityID
 from  Person.Person
 order by LastName,BusinessEntityID
 offset 20 rows fetch next 20 rows only
)
Select cteKeySeek.LastName,
       FirstName,
       cteKeySeek.BusinessEntityID
 from  cteKeySeek
 inner join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
order  by cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID;

Even though its longer, wordier and involves a join , it is more efficient as the join has replaced the key lookup and it is now only occurring on the 20 rows of data that we need

image

Quite neat hey ? When using OFFSET it is important to remember that no magic is happening, SQL Server still has to ‘count’ and scan through the rows that are not to be processed before it can decide which ones it does need. 

A comparable query for previous versions and taking the lead from the holy grail method would be :

with cteKeySeek
as
(
Select BusinessEntityID,LastName,
       row_number() over (order by LastName,BusinessEntityID)-1 
             as RowN
 from  Person.Person

)
Select cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN
  from cteKeySeek
  inner loop join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
where RowN >= 20 and rown<=39
order by LastName,BusinessEntityID;

Which does similarly filter the rows before doing the index lookup

image

It does however still involve a scan of 19,972 rows of which 19,932 are irrelevant to our final result set.  You may of noticed in the OFFSET versions that the TOP operator is used to filter the data and ‘stop’ the scan once it has reached the last row that we are interested in.  What if we could do something similar.

What about this ?:

with cteKeySeek
as
(
Select BusinessEntityID,LastName,
       row_number() over (order by LastName,BusinessEntityID)-1 
             as RowN
 from  Person.Person

)
Select top(20) cteKeySeek.LastName,FirstName,cteKeySeek.BusinessEntityID ,RowN
  from cteKeySeek
  inner loop join  Person.Person   
   on  cteKeySeek.BusinessEntityID =   Person.BusinessEntityID
where RowN >= 20 and RowN<=39
order by LastName,BusinessEntityID;

That does have the rather interesting effect of doing exactly that:

image

So, this is looking (at least in-terms of rowcounts) very similar to the OFFSET functionality.  If we look for a page of data further on (rows 200 to 219) and look at an profiler trace we can see how the three type of query compare.

image

So as you can see over a medium size (ish) dataset the fake and real offset are comparable in terms of IO.

Hope this helps someone, who needs to do paging

Parsing T-SQL – The easy way

Every once in a while, I hit an issue that would require me to interrogate/parse some T-SQL code.  Normally, I would shy away from this and attempt to solve the problem in some other way.  I have written parsers before in the the past using LEX and YACC, and as much fun and awesomeness that path is,  I couldnt justify the time it would take.

However, this week I have been faced with just such an issue and at the back of my mind I can remember reading through the SQLServer 2012 feature pack and seeing something called “Microsoft SQL Server 2012 Transact-SQL Language Service “.  This is described there as :

“The SQL Server Transact-SQL Language Service is a component based on the .NET Framework which provides parsing validation and IntelliSense services for Transact-SQL for SQL Server 2012, SQL Server 2008 R2, and SQL Server 2008. “

Sounds just what I was after.  Documentation is very scant on this so dont take what follows as best practice or best use, just a practice and a use.

Knowing what I was sort of looking for something, I found the relevant assembly in the gac which is the simply named ,’Microsoft.SqlServer.Management.SqlParser’.

Even knowing that you wont find much in terms of documentation if you do a web-search, but you will find the MSDN documentation that list the members and methods etc…

The “scanner”  class sounded the most appropriate for my needs as that is described as “Scans Transact-SQL searching for individual units of code or tokens.”.

After a bit of poking, around the code i ended up with was something like

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null
$ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions
$ParseOptions.BatchSeparator = 'GO'

$Parser = new-object Microsoft.SqlServer.Management.SqlParser.Parser.Scanner($ParseOptions)
$Sql = "Create Procedure MyProc as Select top(10) * from dbo.Table"
$Parser.SetSource($Sql,0)
$Token=[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SET
$Start =0
$End = 0
$State =0 
$IsEndOfBatch = $false
$IsMatched = $false
$IsExecAutoParamHelp = $false
while(($Token = $Parser.GetNext([ref]$State ,[ref]$Start, [ref]$End, [ref]$IsMatched, [ref]$IsExecAutoParamHelp ))-ne [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::EOF) {
    try{
        ($TokenPrs =[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]$Token) | Out-Null
        $TokenPrs
        $Sql.Substring($Start,($end-$Start)+1)
    }catch{
        $TokenPrs = $null
    }    
}

As you can see , the $Sql variable holds the sql to be parsed , that is pushed into the $Parser object using SetSource,  and then we will use GetNext until the EOF token is returned.  GetNext will also return the Start and End character positions within the source string of the parsed text.

This script’s output is :

TOKEN_CREATE
Create
TOKEN_PROCEDURE
Procedure
TOKEN_ID
MyProc
TOKEN_AS
as
TOKEN_SELECT
Select
TOKEN_TOP
top
TOKEN_INTEGER
10
TOKEN_FROM
from
TOKEN_ID
dbo
TOKEN_TABLE
Table

note that the ‘(‘, ‘)’  and ‘*’ characters have returned a token type that is not present in the Microsoft.SqlServer.Management.SqlParser.Parser.Tokens Enum that has caused an error which has been caught in the catch block. 

Fun, Fun ,Fun , Simple T-SQL Parsing.  Hope this helps someone in the same position,  let me know how you get on.

SQLMidlands & SQLLunch

Many thanks to all those that turned out to see my presentation on Thursday (16th of Feb) of “Cursors are Evil” at SQLMidlands.  The scripts i used are here :

https://skydrive.live.com/?cid=4004b6a3bc887e2c&id=4004B6A3BC887E2C%21216

You will need the AdventureWorks2008r2 release to run these, feel free to mail me (dave.ballantyne@live.co.uk) with any questions.  They are based upon a series of articles I wrote for SQLServerCentral which can be found here and here.

Also I am starting ,or at least having an attempt at, a new user group in London.  This is SQLLunch, meeting downstairs at The Golden Fleece , EC4N 1SP which is 2 minutes from Bank Tube , we will have a twice monthly meeting (2nd and 4th Tuesdays) for an ‘All Stuff, No Fluff’ event.  Put plainly, a quick hello followed by a 45 minute presentation , which will ,optimistically, have you there and back to your desk within a lunch hour.

Registrations for the first series of dates are at sqlserverfaq.com

If you would like to speak, then please get in touch.

Hope to see you there. 

[BUG] Inserts to tables with an index view can fail

Unfortunately some of the more troubling bugs can be very hard to reproduce succinctly.  Here is one that has been troubling me for a little while :

The issue is using indexed views with a calculated column. Indexed views, despite their restrictions, are a very handy addition to SQL Server and materializing views to be hard data can certainly improve performance.  So to demonstrate my issue we will need to build a table and create a view on it. 

create table myTable
(
Id integer not null,
InView char(1) not null,
SomeData varchar(255) not null
)
go
Create view vwIxView
with schemabinding
as
Select ID,Somedata,left(SomeData,CHARINDEX('x',SomeData)-1) as leftfromx
from dbo.myTable
Where InView ='Y'
 
As you can see , the view is filtering the data for where InView =’Y’ and is adding a calculated column to do some manipulation of the column ‘SomeData’. This column ,leftfromx, is taking the characters up to and including the first ‘x’ from the ‘SomeData’ column.

If we insert some data into the view with

insert into myTable(Id,InView,SomeData)
select 1,'N','a'

unsurprisingly, if we look to the view then there will be no data in it.

Now lets add an index to the view

create unique clustered index pkvwIxView on vwIxView(Id)

The data is now persisted.

Lets now add some more data ,the same data, in a ever so slightly different way.

declare @id integer,
@inview char(1),
@Somedata char(50)
select @id = 1, @inview = 'N',@Somedata = 'a'

insert into myTable(Id,InView,SomeData)
select @id,@inview,@Somedata

What is the result ?

image

Huh , well its kind of obvious which “LEFT or SUBSTRING function” has errored, but as inview = ‘N’ why should that piece of code even been executed ?  Looking at the estimated plan we can more easily see the flow of events.

image

The ‘compute scalar’ operation is where the LEFT is being executed. ,That is happening before the filter and as there is no ‘x’ in the ‘SomeData’ column , it is unsurprising that the function is erroring.  I have tested this on both 2008r2 and 2012 rc0.

I have raised a connect item here, if you want to upvote it.

Book review - SQL Server Secret Diary (Know the unknown secrets of SQL Server)

Like a lot of people within the SQL community, I can never read enough on the subject.  Books, whitepapers, academic research and blogs can all be valuable source of information, so whilst browsing Amazon I found this book on a free kindle download.  The preface makes some bold claims indeed :

“This book is for developers who already know SQL Server and want to gain more knowledge in SQL Server.  This book is not for starter who want to start from the beginning.

The problem-solution approach will help you to understand and solve the real-time problems easily.

This Book will teach you (their emphasis)

  • How to solve common real-time problems
  • How to improve performance
  • How to protect your data and code
  • How to reduce your code
  • How to use SQL Server efficiently
  • Advanced topics with simple examples
  • Tips and tricks with sample queries
  • And also teach how to do in the better way.

The last bullet point, sets the tone of the quite appalling use of grammar (yes, yes , people in glass houses and all that.. ) contained throughout the entire book,  I get that the authors may use english as a second (or third) language,  but where are the proof readers ?  That i can live with though,  its the technical content i really have a problem with.  Here is just a small selection:

Q 2) How to use GO statement in SQL Server ?

IMO,  the most important concept to understand about GO is that it is not a SQL Statement.  It is processed on the client (SSMS, ISQL etc) and splits the workload into separate batches.  This is not mentioned here,  though to be fair in Q3 (How to repeat the statements without using loops ?) the author notes “GO is a client command and not a T-SQL command". So GO or GO <N> can only be used with Microsoft SQL Server client tools.”. 

Q 5) How to use ORDER BY clause in view ?

Here the author spends a great deal of time and effort working around “As per RDBMS rule ORDER BY clause is not allowed in view”.  This section should be thrown away entirely,  if you are depending on the view ordering ( which is a contradiction in terms)  for your result set ordering you deserve all the law suits that are thrown at you. 

Q 10) How to do case sensitive searches in SQL Server ?

The authors solution here is to cast a column as varbinary. OK, fair enough it works. Personally, i would have used COLLATE but lets not split hairs.  The biggest issue i have here is sargability is not mentioned,  we are introducing the possibility of a scan.

Q 12) How to solve server time out problem ?

The scenario presented here is that session #1 has updated some data that session #2 needs to read.  The author presents 2 solutions NOLOCK and READPAST and ,to be fair, does make an attempt at highlighting the dirty reads.  My issue here is that, once again, locking is seen as the enemy that must be worked around.  We should embrace locks, understand why they are happening and how they are protecting us.  The point is not raised that the fault here lies with the UPDATE’ing transaction not completing in a timely fashion, not that the reader cannot complete due to that. The consequences of reading and processing dirty data are not explored thoroughly enough and once again, NOLOCK is used as a “go faster” button.

Q 33 ) How to improve the performance of stored procedure ?

Here we have been given 11 bullet points by the authors , which I have copied verbatim below. My thoughts about each point are inlined in red:

  • Use SET NOCOUNT ON to avoid sending row count information for every statement. So, this can help, but will only have a measurable effect if you have many many statements,  but in that case you are coding sql wrong anyway.
  • Always use the owner name or schema name before the object name to prevent recompilation of stored procedure.  Does this mean that by not referencing the owner or schema (which one is it ?? ) objects will always cause a recompile of the entire stored procedure ? No.  The statement not necessarily the stored procedure, will recompile if the user has a different default schema to the existing compiled statement.
  • Avoid using DISTINCT Just distinct ? Any thing else ? Unnecessary ORDER BY ?
  • Minimize the number of columns in SELECT clause So, Select Col1,Col2,Col3 is bad but Select Col1 +’ ‘+ Col2 +’ ‘+Col3 is ok ? Better wording here would be “Return only the data that is required by the application, nothing more, nothing less.”
  • Use table variables instead temporary tables. Seriously ! What ! Come again.  As a sweeping general statement wrong wrong wrong.
  • Use the CTE ( Common Table BLOCKED EXPRESSION instead of derived tables and table variables as much as possible. Again, massive over generalisation.  Horses for courses.  Also, didn't you just say that i should use table variables.
  • Avoid using cursors Why ? and what should i do instead ?  I have to get the data out some how , what alternatives are there ?
  • Don’t use duplicate codes, reuse the code by Views and UDF’s  This section is about performance , right ? I would like to see one single instance where using a view ( presumably unindexed ) or a UDF (cough , splutter) improves performance.
  • Begin and commit transactions immediately Better wording would be “Keep transactions as short as possible, never leave a transaction open while waiting to user input.”
  • Avoid exclusive locks Confusing,  in what context ?
  • Use table hints BwaaHaa,  this is really a pandora’s box best left by the audience of this book.

And so it continues.  I’m trying really hard to not be to scathing or nit-picky about this book, there is some good advice here, but SQL Server is full of caveats , confusing and contradictory best practices and ultimately 90% of the time you can state that “It depends”. 

Questions are presented with solutions that can work but are given as 100% solutions not with any degree of warning that that may not always be the case.  Even as a free download, it is way too expensive, and, remembering the target audience, could ultimately do more harm than good.

Extended Events - inaccurate_cardinality_estimate

Extended events have been a bit of a personal “Elephant in the room” for me.  I know they are there and I should really get on a start using them but never *quite* have a compelling enough reason. 

So now i really do,  after comparing the events in sys.dm_xe_objects between 2008r2 and 2012 I found one that really peaked my interest,  inaccurate_cardinality_estimate.  This is described as “Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

IMO cardinality estimation errors are the number one cause of performance problems.  If sqlserver deduces ( or even guesses) an incorrect row estimation then all bets are off,  if you get anything approaching a decent plan , its by luck not judgement.

So, lets see what if we can cause this event to fire.  Firing up management studio, we have the new extended events manager,

image

which sounds like a fun tool to play with Smile So starting a new session and going to the events library

image

and filtering by ‘Card’

image

Oh , nothing found.  Its simply not there , here is a connect item for this issue.

So , we will have to do this a more ‘manual’ way

CREATE EVENT SESSION inaccurate_cardinality_estimate ON SERVER
ADD EVENT sqlserver.inaccurate_cardinality_estimate
( ACTION (sqlserver.plan_handle, sqlserver.sql_text) )
ADD TARGET package0.asynchronous_file_target
( SET FILENAME = N'c:\temp\inaccurate_cardinality_estimate.xel',
metadatafile = N'c:\temp\inaccurate_cardinality_estimate.xem' );

Ok , session defined , lets start it.

ALTER EVENT SESSION inaccurate_cardinality_estimate ON SERVER STATE = START

To demonstrate the actual event we need to create and populate a temporary table :

drop table #newids
go
create table #NewIds
(
id char(36)
)
go
insert into #NewIds
select top(100)
cast(newid() as char(36))
from sys.all_columns a cross join sys.all_columns b
If we now execute
 
declare @v varchar(10)
Select @v='%XX%'
select COUNT(*) from #NewIds where id like @v

We will get an estimated row count of 5.37528.
 
I can control the exact actual row count by updating a number of rows to start ‘XX’ and thereby create a cardinality estimation error.
 
For starters lets update all the rows ( Im wrapping the updates in a transaction and rolling back , for sake of brevity this is not shown)
 
update 
#NewIds
set id = 'XX'+left(id,20)

declare @v varchar(10)
Select @v='%XX%'
select COUNT(*) from #NewIds where id like @v
Then plan for the select shows a cardinality error, as expected

image
 
100 actual , 5.37528 expected.
 
Stop the extended events session
 
ALTER EVENT SESSION inaccurate_cardinality_estimate ON SERVER STATE = STOP

and all being well , in the c:\temp folder you will see an extended event log file.
 
Open that in management studio
 
image

There is the event, nice.  But hold on one cotton picking minute, look at the row counts.  Estimated = 5 , actual = 26 !?!

What happens if we repeat this operation but doubling the rows in the temp table

If we double the amount of rows in our temp table to 200,  our estimate rows in the plan will show as 10.7506 and actual as 200. In the extended event we see :

image

So the estimated count is shown as floor(row estimate) and the event is fired when the actual row count goes over 5*plan estimate, which is why actual is shown here as 53 not 200.  Notice that we also have the plan_handle and the node_id if we wish to tie this back to an exact operator in our system.

Quite why this is an extended event and not a plan warning , i really have no idea,  still its nice to know its there.

MythBusting–“Table variables have no statistics”

Ok, as myths go, its a pretty weak one.  In fact, it is true, this whitepaper explicitly states that.  But hand in hand with that statement goes another one, “Table variables will always estimate to one row”.  This is most definitely false,  if there are no statistics then sql server can, at times, default to its ‘guessing’ of distribution of data based upon row counts.  This behaviour can even further muddy the water of the old “Which is better, table variables or temp tables” argument.

To demonstrate this, firstly we need to populate a numbers table

create table numbers
(
Num integer primary key
)
go
insert into numbers
Select top(1000) ROW_NUMBER() over (order by (select null))
from sys.columns a cross join sys.columns b

Now we execute the following code

Declare @TableVar Table
(
ID integer not null primary key,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

Select tv.Id,num
from @TableVar tv
join numbers
on tv.ID = num


and looking at the execution plan, we see :
image
 
1 Row estimated and 20 rows actual, as you may well expect.  Now add ‘OPTION(RECOMPILE)’  the plan is now different.
 
image
 
Look at that an accurate row estimation.  How about if we are filter to the statement say ‘Mod10=0’
 

image
 
Another different but wrong estimation.  This is because table variables dont have statistics, but we do have row counts.  It is worth pointing out at this point that these are the same numbers you will get if you did these operations on a normal ‘permanent’ table, but had turned off AUTO STATISTICS.
 
Obviously in a production environment, you would only be using RECOMPILE in ‘special’ circumstances, right ?  So, this isn't an issue.  All your table variables will be estimating as one row.  Wrong,  I would be willing to bet that a surprisingly high number are estimating as something else.  If you are so inclined, so can probably find quite a few in the dmv sys.dm_exec_query_plan.  So, how does this happen ? Well,  in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1.
 
OK…  So lets step through that.  Ignore the extra Select statement that counts from adventureworks,  its just there to create a more ‘complicated’ stored procedure and we get multiple statements cached in the plan.
 
drop table IDs
go
create table IDs
(
Id integer primary key,padding char(255)
)
go
insert into IDs(Id,padding)
Select top(1) num,'xxx'
from numbers
order by num
go
drop procedure TableVarTest
go
create procedure TableVarTest
as
declare @TableVar Table
(
ID integer not null,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

select COUNT(*)
from AdventureWorks2008r2.dbo.Customer C
join AdventureWorks2008r2.dbo.CustomerOrders CO
on C.CustomerId = CO.CustomerId

Select tv.Id,IDs.id
from @TableVar tv
join IDs
on tv.ID = IDs.Id
where mod10 =0
go

On first execution the join of the table variable to IDs produces…
 
image
 
Now, lets add some more data to ID’s and force a recompile just for good measure :
 
insert into IDs(Id,padding)
Select top(1000) num,'xxx'
from numbers
where not exists(select id from IDs where id = num )
order by num
go
exec sp_recompile ids

and then re-execute the stored procedure
 
image
 
So, one myth busted and one proved, not bad for one blog.
Execution plan warnings–The final chapter

In my previous posts (here and here), I showed examples of some of the execution plan warnings that have been added to SQL Server 2012.  There is one other warning that is of interest to me : “Unmatched Indexes”.

Firstly, how do I know this is the final one ?  The plan is an XML document, right ? So that means that it can have an accompanying XSD.  As an XSD is a schema definition, we can poke around inside it to find interesting things that *could* be in the final XML file.

The showplan schema is stored in the folder Microsoft SQL Server\110\Tools\Binn\schemas\sqlserver\2004\07\showplan and by comparing schemas over releases you can get a really good idea of any new functionality that has been added.

Here is the section of the Sql Server 2012 showplan schema that has been interesting me so far :

<xsd:complexType name="AffectingConvertWarningType">
<xsd:annotation>
<xsd:documentation>Warning information for plan-affecting type conversion</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<!-- Additional information may go here when available -->
</xsd:sequence>
<xsd:attribute name="ConvertIssue" use="required">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Cardinality Estimate" />
<xsd:enumeration value="Seek Plan" />
<!-- to be extended here -->
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Expression" type ="xsd:string" use="required" />
</xsd:complexType>
<xsd:complexType name="WarningsType">
<xsd:annotation>
<xsd:documentation>List of all possible iterator or query specific warnings (e.g. hash spilling, no join predicate)</xsd:documentation>
</xsd:annotation>
<xsd:choice minOccurs="1" maxOccurs="unbounded">
<xsd:element name="ColumnsWithNoStatistics" type="shp:ColumnReferenceListType" minOccurs="0" maxOccurs="1" />
<xsd:element name="SpillToTempDb" type="shp:SpillToTempDbType" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="Wait" type="shp:WaitWarningType" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="PlanAffectingConvert" type="shp:AffectingConvertWarningType" minOccurs="0" maxOccurs="unbounded" />
</xsd:choice>
<xsd:attribute name="NoJoinPredicate" type="xsd:boolean" use="optional" />
<xsd:attribute name="SpatialGuess" type="xsd:boolean" use="optional" />
<xsd:attribute name="UnmatchedIndexes" type="xsd:boolean" use="optional" />
<xsd:attribute name="FullUpdateForOnlineIndexBuild" type="xsd:boolean" use="optional" />
</xsd:complexType>

I especially like the “to be extended here” comment,  high hopes that we will see more of these in the future.
 
So “Unmatched Indexes” was a warning that I couldn’t get and many thanks must go to Fabiano Amorim (b|t) for showing me the way.
 
Filtered indexes were introduced in Sql Server 2008 and are really useful if you only need to index only a portion of the data within a table.  However,  if your SQL code uses a variable as a predicate on the filtered data that matches the filtered condition, then the filtered index cannot be used as, naturally,  the value in the variable may ( and probably will ) change and therefore will need to read data outside the index.  As an aside,  you could use option(recompile) here , in which case the optimizer will build a plan specific to the variable values and use the filtered index,  but that can bring about other problems.
 
To demonstrate this warning, we need to generate some test data :
 
DROP TABLE #TestTab1
GO
CREATE TABLE #TestTab1 (Col1 Int not null,
Col2 Char(7500) not null,
Quantity Int not null)
GO

INSERT INTO #TestTab1 VALUES (1,1,1),(1,2,5),(1,2,10),(1,3,20),
(2,1,101),(2,2,105),(2,2,110),(2,3,120)
GO

and then add a filtered index

CREATE INDEX ixFilter ON #TestTab1 (Col1)
WHERE Quantity = 122

Now if we execute

SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = 122

We will see the filtered index being scanned

image

But if we parameterize the query

DECLARE @i INT = 122
SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = @i

The plan is very different

image

a table scan, as the value of the variable used in the predicate can change at run time, and also we see the familiar warning triangle.

If we now look at the properties pane, we will see two pieces of information “Warnings” and “UnmatchedIndexes”.

image

So, handily, we are being told which filtered index is not being used due to parameterization.

Blogging from 37,000ft

Im currently on my way to Sql Rally nordic and looking forward to a few days of full on SQL geekery and “Unleashing my inner Viking”.  I shall be speaking on Wednesday afternoon on one of my favourite subjects “Cursors are Evil”.  Ok,  so lets put it into perspective, “Evil” is a bit dramatic , but “Often use inappropriately and can cause serious performance bottlenecks” didn't have quite the same ring Smile

If you are not going to be at SQL Rally,  im going to be repeating it at the Leeds and Manchester user groups on the 23rd and 24th of November respectively.  Presenting with me on these nights will be James Boother, so make it along to those if you can.  I look forward to seeing you at one of these events.

More Posts Next page »