SQL and the like

Dave Ballantyne's blog. Freelance SQL Server database designer and developer at Clear Sky SQL
[BUG] Inserts to tables with an index view can fail

Unfortunately some of the more troubling bugs can be very hard to reproduce succinctly.  Here is one that has been troubling me for a little while :

The issue is using indexed views with a calculated column. Indexed views, despite their restrictions, are a very handy addition to SQL Server and materializing views to be hard data can certainly improve performance.  So to demonstrate my issue we will need to build a table and create a view on it. 

create table myTable
(
Id integer not null,
InView char(1) not null,
SomeData varchar(255) not null
)
go
Create view vwIxView
with schemabinding
as
Select ID,Somedata,left(SomeData,CHARINDEX('x',SomeData)-1) as leftfromx
from dbo.myTable
Where InView ='Y'
 
As you can see , the view is filtering the data for where InView =’Y’ and is adding a calculated column to do some manipulation of the column ‘SomeData’. This column ,leftfromx, is taking the characters up to and including the first ‘x’ from the ‘SomeData’ column.

If we insert some data into the view with

insert into myTable(Id,InView,SomeData)
select 1,'N','a'

unsurprisingly, if we look to the view then there will be no data in it.

Now lets add an index to the view

create unique clustered index pkvwIxView on vwIxView(Id)

The data is now persisted.

Lets now add some more data ,the same data, in a ever so slightly different way.

declare @id integer,
@inview char(1),
@Somedata char(50)
select @id = 1, @inview = 'N',@Somedata = 'a'

insert into myTable(Id,InView,SomeData)
select @id,@inview,@Somedata

What is the result ?

image

Huh , well its kind of obvious which “LEFT or SUBSTRING function” has errored, but as inview = ‘N’ why should that piece of code even been executed ?  Looking at the estimated plan we can more easily see the flow of events.

image

The ‘compute scalar’ operation is where the LEFT is being executed. ,That is happening before the filter and as there is no ‘x’ in the ‘SomeData’ column , it is unsurprising that the function is erroring.  I have tested this on both 2008r2 and 2012 rc0.

I have raised a connect item here, if you want to upvote it.

Book review - SQL Server Secret Diary (Know the unknown secrets of SQL Server)

Like a lot of people within the SQL community, I can never read enough on the subject.  Books, whitepapers, academic research and blogs can all be valuable source of information, so whilst browsing Amazon I found this book on a free kindle download.  The preface makes some bold claims indeed :

“This book is for developers who already know SQL Server and want to gain more knowledge in SQL Server.  This book is not for starter who want to start from the beginning.

The problem-solution approach will help you to understand and solve the real-time problems easily.

This Book will teach you (their emphasis)

  • How to solve common real-time problems
  • How to improve performance
  • How to protect your data and code
  • How to reduce your code
  • How to use SQL Server efficiently
  • Advanced topics with simple examples
  • Tips and tricks with sample queries
  • And also teach how to do in the better way.

The last bullet point, sets the tone of the quite appalling use of grammar (yes, yes , people in glass houses and all that.. ) contained throughout the entire book,  I get that the authors may use english as a second (or third) language,  but where are the proof readers ?  That i can live with though,  its the technical content i really have a problem with.  Here is just a small selection:

Q 2) How to use GO statement in SQL Server ?

IMO,  the most important concept to understand about GO is that it is not a SQL Statement.  It is processed on the client (SSMS, ISQL etc) and splits the workload into separate batches.  This is not mentioned here,  though to be fair in Q3 (How to repeat the statements without using loops ?) the author notes “GO is a client command and not a T-SQL command". So GO or GO <N> can only be used with Microsoft SQL Server client tools.”. 

Q 5) How to use ORDER BY clause in view ?

Here the author spends a great deal of time and effort working around “As per RDBMS rule ORDER BY clause is not allowed in view”.  This section should be thrown away entirely,  if you are depending on the view ordering ( which is a contradiction in terms)  for your result set ordering you deserve all the law suits that are thrown at you. 

Q 10) How to do case sensitive searches in SQL Server ?

The authors solution here is to cast a column as varbinary. OK, fair enough it works. Personally, i would have used COLLATE but lets not split hairs.  The biggest issue i have here is sargability is not mentioned,  we are introducing the possibility of a scan.

Q 12) How to solve server time out problem ?

The scenario presented here is that session #1 has updated some data that session #2 needs to read.  The author presents 2 solutions NOLOCK and READPAST and ,to be fair, does make an attempt at highlighting the dirty reads.  My issue here is that, once again, locking is seen as the enemy that must be worked around.  We should embrace locks, understand why they are happening and how they are protecting us.  The point is not raised that the fault here lies with the UPDATE’ing transaction not completing in a timely fashion, not that the reader cannot complete due to that. The consequences of reading and processing dirty data are not explored thoroughly enough and once again, NOLOCK is used as a “go faster” button.

Q 33 ) How to improve the performance of stored procedure ?

Here we have been given 11 bullet points by the authors , which I have copied verbatim below. My thoughts about each point are inlined in red:

  • Use SET NOCOUNT ON to avoid sending row count information for every statement. So, this can help, but will only have a measurable effect if you have many many statements,  but in that case you are coding sql wrong anyway.
  • Always use the owner name or schema name before the object name to prevent recompilation of stored procedure.  Does this mean that by not referencing the owner or schema (which one is it ?? ) objects will always cause a recompile of the entire stored procedure ? No.  The statement not necessarily the stored procedure, will recompile if the user has a different default schema to the existing compiled statement.
  • Avoid using DISTINCT Just distinct ? Any thing else ? Unnecessary ORDER BY ?
  • Minimize the number of columns in SELECT clause So, Select Col1,Col2,Col3 is bad but Select Col1 +’ ‘+ Col2 +’ ‘+Col3 is ok ? Better wording here would be “Return only the data that is required by the application, nothing more, nothing less.”
  • Use table variables instead temporary tables. Seriously ! What ! Come again.  As a sweeping general statement wrong wrong wrong.
  • Use the CTE ( Common Table BLOCKED EXPRESSION instead of derived tables and table variables as much as possible. Again, massive over generalisation.  Horses for courses.  Also, didn't you just say that i should use table variables.
  • Avoid using cursors Why ? and what should i do instead ?  I have to get the data out some how , what alternatives are there ?
  • Don’t use duplicate codes, reuse the code by Views and UDF’s  This section is about performance , right ? I would like to see one single instance where using a view ( presumably unindexed ) or a UDF (cough , splutter) improves performance.
  • Begin and commit transactions immediately Better wording would be “Keep transactions as short as possible, never leave a transaction open while waiting to user input.”
  • Avoid exclusive locks Confusing,  in what context ?
  • Use table hints BwaaHaa,  this is really a pandora’s box best left by the audience of this book.

And so it continues.  I’m trying really hard to not be to scathing or nit-picky about this book, there is some good advice here, but SQL Server is full of caveats , confusing and contradictory best practices and ultimately 90% of the time you can state that “It depends”. 

Questions are presented with solutions that can work but are given as 100% solutions not with any degree of warning that that may not always be the case.  Even as a free download, it is way too expensive, and, remembering the target audience, could ultimately do more harm than good.

Extended Events - inaccurate_cardinality_estimate

Extended events have been a bit of a personal “Elephant in the room” for me.  I know they are there and I should really get on a start using them but never *quite* have a compelling enough reason. 

So now i really do,  after comparing the events in sys.dm_xe_objects between 2008r2 and 2012 I found one that really peaked my interest,  inaccurate_cardinality_estimate.  This is described as “Occurs when an operator outputs significantly more rows than estimated by the Query Optimizer. Use this event to identify queries that may be using sub-optimal plans due to cardinality estimate inaccuracy. Using this event can have a significant performance overhead so it should only be used when troubleshooting or monitoring specific problems for brief periods of time.

IMO cardinality estimation errors are the number one cause of performance problems.  If sqlserver deduces ( or even guesses) an incorrect row estimation then all bets are off,  if you get anything approaching a decent plan , its by luck not judgement.

So, lets see what if we can cause this event to fire.  Firing up management studio, we have the new extended events manager,

image

which sounds like a fun tool to play with Smile So starting a new session and going to the events library

image

and filtering by ‘Card’

image

Oh , nothing found.  Its simply not there , here is a connect item for this issue.

So , we will have to do this a more ‘manual’ way

CREATE EVENT SESSION inaccurate_cardinality_estimate ON SERVER
ADD EVENT sqlserver.inaccurate_cardinality_estimate
( ACTION (sqlserver.plan_handle, sqlserver.sql_text) )
ADD TARGET package0.asynchronous_file_target
( SET FILENAME = N'c:\temp\inaccurate_cardinality_estimate.xel',
metadatafile = N'c:\temp\inaccurate_cardinality_estimate.xem' );

Ok , session defined , lets start it.

ALTER EVENT SESSION inaccurate_cardinality_estimate ON SERVER STATE = START

To demonstrate the actual event we need to create and populate a temporary table :

drop table #newids
go
create table #NewIds
(
id char(36)
)
go
insert into #NewIds
select top(100)
cast(newid() as char(36))
from sys.all_columns a cross join sys.all_columns b
If we now execute
 
declare @v varchar(10)
Select @v='%XX%'
select COUNT(*) from #NewIds where id like @v

We will get an estimated row count of 5.37528.
 
I can control the exact actual row count by updating a number of rows to start ‘XX’ and thereby create a cardinality estimation error.
 
For starters lets update all the rows ( Im wrapping the updates in a transaction and rolling back , for sake of brevity this is not shown)
 
update 
#NewIds
set id = 'XX'+left(id,20)

declare @v varchar(10)
Select @v='%XX%'
select COUNT(*) from #NewIds where id like @v
Then plan for the select shows a cardinality error, as expected

image
 
100 actual , 5.37528 expected.
 
Stop the extended events session
 
ALTER EVENT SESSION inaccurate_cardinality_estimate ON SERVER STATE = STOP

and all being well , in the c:\temp folder you will see an extended event log file.
 
Open that in management studio
 
image

There is the event, nice.  But hold on one cotton picking minute, look at the row counts.  Estimated = 5 , actual = 26 !?!

What happens if we repeat this operation but doubling the rows in the temp table

If we double the amount of rows in our temp table to 200,  our estimate rows in the plan will show as 10.7506 and actual as 200. In the extended event we see :

image

So the estimated count is shown as floor(row estimate) and the event is fired when the actual row count goes over 5*plan estimate, which is why actual is shown here as 53 not 200.  Notice that we also have the plan_handle and the node_id if we wish to tie this back to an exact operator in our system.

Quite why this is an extended event and not a plan warning , i really have no idea,  still its nice to know its there.

MythBusting–“Table variables have no statistics”

Ok, as myths go, its a pretty weak one.  In fact, it is true, this whitepaper explicitly states that.  But hand in hand with that statement goes another one, “Table variables will always estimate to one row”.  This is most definitely false,  if there are no statistics then sql server can, at times, default to its ‘guessing’ of distribution of data based upon row counts.  This behaviour can even further muddy the water of the old “Which is better, table variables or temp tables” argument.

To demonstrate this, firstly we need to populate a numbers table

create table numbers
(
Num integer primary key
)
go
insert into numbers
Select top(1000) ROW_NUMBER() over (order by (select null))
from sys.columns a cross join sys.columns b

Now we execute the following code

Declare @TableVar Table
(
ID integer not null primary key,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

Select tv.Id,num
from @TableVar tv
join numbers
on tv.ID = num


and looking at the execution plan, we see :
image
 
1 Row estimated and 20 rows actual, as you may well expect.  Now add ‘OPTION(RECOMPILE)’  the plan is now different.
 
image
 
Look at that an accurate row estimation.  How about if we are filter to the statement say ‘Mod10=0’
 

image
 
Another different but wrong estimation.  This is because table variables dont have statistics, but we do have row counts.  It is worth pointing out at this point that these are the same numbers you will get if you did these operations on a normal ‘permanent’ table, but had turned off AUTO STATISTICS.
 
Obviously in a production environment, you would only be using RECOMPILE in ‘special’ circumstances, right ?  So, this isn't an issue.  All your table variables will be estimating as one row.  Wrong,  I would be willing to bet that a surprisingly high number are estimating as something else.  If you are so inclined, so can probably find quite a few in the dmv sys.dm_exec_query_plan.  So, how does this happen ? Well,  in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1.
 
OK…  So lets step through that.  Ignore the extra Select statement that counts from adventureworks,  its just there to create a more ‘complicated’ stored procedure and we get multiple statements cached in the plan.
 
drop table IDs
go
create table IDs
(
Id integer primary key,padding char(255)
)
go
insert into IDs(Id,padding)
Select top(1) num,'xxx'
from numbers
order by num
go
drop procedure TableVarTest
go
create procedure TableVarTest
as
declare @TableVar Table
(
ID integer not null,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

select COUNT(*)
from AdventureWorks2008r2.dbo.Customer C
join AdventureWorks2008r2.dbo.CustomerOrders CO
on C.CustomerId = CO.CustomerId

Select tv.Id,IDs.id
from @TableVar tv
join IDs
on tv.ID = IDs.Id
where mod10 =0
go

On first execution the join of the table variable to IDs produces…
 
image
 
Now, lets add some more data to ID’s and force a recompile just for good measure :
 
insert into IDs(Id,padding)
Select top(1000) num,'xxx'
from numbers
where not exists(select id from IDs where id = num )
order by num
go
exec sp_recompile ids

and then re-execute the stored procedure
 
image
 
So, one myth busted and one proved, not bad for one blog.
Execution plan warnings–The final chapter

In my previous posts (here and here), I showed examples of some of the execution plan warnings that have been added to SQL Server 2012.  There is one other warning that is of interest to me : “Unmatched Indexes”.

Firstly, how do I know this is the final one ?  The plan is an XML document, right ? So that means that it can have an accompanying XSD.  As an XSD is a schema definition, we can poke around inside it to find interesting things that *could* be in the final XML file.

The showplan schema is stored in the folder Microsoft SQL Server\110\Tools\Binn\schemas\sqlserver\2004\07\showplan and by comparing schemas over releases you can get a really good idea of any new functionality that has been added.

Here is the section of the Sql Server 2012 showplan schema that has been interesting me so far :

<xsd:complexType name="AffectingConvertWarningType">
<xsd:annotation>
<xsd:documentation>Warning information for plan-affecting type conversion</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
<!-- Additional information may go here when available -->
</xsd:sequence>
<xsd:attribute name="ConvertIssue" use="required">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:enumeration value="Cardinality Estimate" />
<xsd:enumeration value="Seek Plan" />
<!-- to be extended here -->
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Expression" type ="xsd:string" use="required" />
</xsd:complexType>
<xsd:complexType name="WarningsType">
<xsd:annotation>
<xsd:documentation>List of all possible iterator or query specific warnings (e.g. hash spilling, no join predicate)</xsd:documentation>
</xsd:annotation>
<xsd:choice minOccurs="1" maxOccurs="unbounded">
<xsd:element name="ColumnsWithNoStatistics" type="shp:ColumnReferenceListType" minOccurs="0" maxOccurs="1" />
<xsd:element name="SpillToTempDb" type="shp:SpillToTempDbType" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="Wait" type="shp:WaitWarningType" minOccurs="0" maxOccurs="unbounded" />
<xsd:element name="PlanAffectingConvert" type="shp:AffectingConvertWarningType" minOccurs="0" maxOccurs="unbounded" />
</xsd:choice>
<xsd:attribute name="NoJoinPredicate" type="xsd:boolean" use="optional" />
<xsd:attribute name="SpatialGuess" type="xsd:boolean" use="optional" />
<xsd:attribute name="UnmatchedIndexes" type="xsd:boolean" use="optional" />
<xsd:attribute name="FullUpdateForOnlineIndexBuild" type="xsd:boolean" use="optional" />
</xsd:complexType>

I especially like the “to be extended here” comment,  high hopes that we will see more of these in the future.
 
So “Unmatched Indexes” was a warning that I couldn’t get and many thanks must go to Fabiano Amorim (b|t) for showing me the way.
 
Filtered indexes were introduced in Sql Server 2008 and are really useful if you only need to index only a portion of the data within a table.  However,  if your SQL code uses a variable as a predicate on the filtered data that matches the filtered condition, then the filtered index cannot be used as, naturally,  the value in the variable may ( and probably will ) change and therefore will need to read data outside the index.  As an aside,  you could use option(recompile) here , in which case the optimizer will build a plan specific to the variable values and use the filtered index,  but that can bring about other problems.
 
To demonstrate this warning, we need to generate some test data :
 
DROP TABLE #TestTab1
GO
CREATE TABLE #TestTab1 (Col1 Int not null,
Col2 Char(7500) not null,
Quantity Int not null)
GO

INSERT INTO #TestTab1 VALUES (1,1,1),(1,2,5),(1,2,10),(1,3,20),
(2,1,101),(2,2,105),(2,2,110),(2,3,120)
GO

and then add a filtered index

CREATE INDEX ixFilter ON #TestTab1 (Col1)
WHERE Quantity = 122

Now if we execute

SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = 122

We will see the filtered index being scanned

image

But if we parameterize the query

DECLARE @i INT = 122
SELECT COUNT(*) FROM #TestTab1 WHERE Quantity = @i

The plan is very different

image

a table scan, as the value of the variable used in the predicate can change at run time, and also we see the familiar warning triangle.

If we now look at the properties pane, we will see two pieces of information “Warnings” and “UnmatchedIndexes”.

image

So, handily, we are being told which filtered index is not being used due to parameterization.

Blogging from 37,000ft

Im currently on my way to Sql Rally nordic and looking forward to a few days of full on SQL geekery and “Unleashing my inner Viking”.  I shall be speaking on Wednesday afternoon on one of my favourite subjects “Cursors are Evil”.  Ok,  so lets put it into perspective, “Evil” is a bit dramatic , but “Often use inappropriately and can cause serious performance bottlenecks” didn't have quite the same ring Smile

If you are not going to be at SQL Rally,  im going to be repeating it at the Leeds and Manchester user groups on the 23rd and 24th of November respectively.  Presenting with me on these nights will be James Boother, so make it along to those if you can.  I look forward to seeing you at one of these events.

Execution plan warnings–All that glitters is not gold

In a previous post, I showed you the new execution plan warnings related to implicit and explicit warnings.  Pretty much as soon as i hit ’post’,  I noticed something rather odd happening.

This statement :

select top(10) 
SalesOrderHeader.SalesOrderID,
SalesOrderNumber
from Sales.SalesOrderHeader
join Sales.SalesOrderDetail
on SalesOrderHeader.SalesOrderID
= SalesOrderDetail.SalesOrderID
 
Throws the “Type conversion may affect cardinality estimation” warning.
 
image
 
Ive done no such conversion in my statement why would that be ?  Well, SalesOrderNumber is a computed column , “(isnull(N'SO'+CONVERT([nvarchar](23),[SalesOrderID],0),N'*** ERROR ***'))”,  so thats where the conversion is.
 
Wait!!! Am i saying that every type conversion will throw the warning ?  Thankfully, no.  It only appears for columns that are used in predicates ,even if the predicate / join condition is fine ,  and the column is indexed ( and/or , presumably has statistics). 
 
Hopefully , this wont lead to to many wild goose chases, but is definitely something to bear in mind.  If you want to see this fixed then upvote my connect item here.
More Denali Execution Plan Warning Goodies

In my last blog, I showed how the execution plan in denali has been enhanced by 2 new warnings ,conversion affecting cardinality and conversion affecting seek, which are shown when a data type conversion has happened either implicitly or explicitly.

That is not all though, there is more Smile.  Also added are two warnings when performance has been affected due to memory issues.

Memory spills to tempdb are a costly operation and happen when SqlServer is under memory pressure and needs to free some up. For a long time you have been able to see these as warnings in a profiler trace as a sort or hash warning event,  but now they are included right in the execution plan.  Not only that but also you can see which operator caused the spill , not just which statement.  Pretty damn handy.

image

Another cause of performance problems relating to memory are memory grant waits.  Here is an informative write up on them,  but simply speaking , SQLServer has to allocate a certain amount of memory for each statement. If it is unable to you get a “memory grant wait”.  Once again there are other methods of analyzing these,  but the plan now shows these too.

image

Don't worry that’s not real production code Smile

There is one other new warning that is of interest to me, “Unmatched Indexes”.  Once I find out the conditions under which that fires ill blog about it.

Implicit Conversions warning

After adding an index,  ensuring that an index is actually used is probably the biggest win you will get in terms of performance in SqlServer.  There are many ways that an query can be non-sargable, and therefore not using an available index,  a common one is implicit (or even explicit) type conversion.

Within Denali ( or now SqlServer 2012 to give it its correct name) two new warnings have been added to the execution plan to help highlight this issue.  “Type conversion in expression ColumnExpression may affect "CardinalityEstimate" in query plan choice” and “Type conversion in expression ColumnExpression may affect "SeekPlan" in query plan choice” are now shown if a type conversion has happened and it is affecting , or could be , affecting performance.

To demonstrate both I will need to populate a small table

drop table nums
go
create table nums
(
IntCol Char(10)
)
go

insert into nums(IntCol)
Select top(10) row_number() over(order by (select null)) from sys.columns


Notice that the IntCol column is defined as a char(10) type.  If i now execute the following query
 
select * from nums where IntCol = 8
 
In the execution plan viewer, I will see this
image
 
There is a Yellow Exclamation Mark warning on the query it self and the warning is that the type conversion may affect CardinalityEstimate.  Its pretty obvious now what our potential problem is.
 
Additionally, if an index is created:
 
create index Idx1 on nums(IntCol)

and the Select statement re-executed,  this is shown..
 
 image
Even better , In layman's terms “You have an index, but i cant use it”.
Rows or Range, What’s the difference ?

With the release of Denali CTP3 came an extension of the over clause to allow for a sliding window of data.  This allows us to , amongst other things, to efficiently and neatly calculate rolling balances.  This is a very common requirement for a database system and one which crops up time and time again on forums etc.

The basic syntax is documented in BOL here and Wayne Sheffield(blog|twitter) has written a great introduction here.

To help demonstrate, I need to generate a temporary table using some data from AdventureWorks.

use AdventureWorks2008R2
go
drop table #orders
go

Select SalesPersonID,
min(OrderDate) as OrderMonth,
sum(TotalDue ) as TotalDue
into #Orders
from Sales.SalesOrderHeader SOH
where SOH.SalesPersonID is not null
group by SalesPersonID,datediff(mm,0,OrderDate);

go
create clustered index idxOrder on #Orders(SalesPersonId,OrderMonth);
go

So, in Denali CTP3 to perform a rolling balance of the TotalDue accumulating for each OrderMonth we can perform:

select *,
sum(TotalDue) over(partition by SalesPersonId
order by OrderMonth)
as RollingBalance
from #Orders
order by SalesPersonID,OrderMonth;

and we will see a result set like this
 
image

Neat, a rolling balance, and as we as partitioning by SalesPersonID it will be re-initialised for each SalesPerson.  Problem solved.

But what about the ‘BY RANGE’ and ‘BY ROWS’ option of the clause , what’s their significance ?

Let us duplicate the data in the table

insert into #Orders
select * from #Orders

and retry the RollingBalance query (  remember that the default is RANGE so this is exactly the same query as before )

select *,
sum(TotalDue) over(partition by SalesPersonId
order by OrderMonth
Range UNBOUNDED PRECEDING)
as RollingBalance
from #Orders
order by SalesPersonID,OrderMonth;

image

Notice how the RollingBalance is now summing together all the totaldue values for each SalesPersonId / ordermonth.

Where-as by using ROWS

select *,
sum(TotalDue) over(partition by SalesPersonId
order by OrderMonth
Rows UNBOUNDED PRECEDING)
as RollingBalance
from #Orders
order by SalesPersonID,OrderMonth;

image

We now have the value incremented over each row.

Other than the output , ( which i grant you is quite important Smile ) there is another fundamental difference to be looked at.  How much work has SQLServer done to service both of these queries ?

Lets create more data..

insert into #Orders
Select SalesPersonID,OrderMonth,TotalDue
from #Orders
go 8
and re-execute the queries
 
image
 
The columns are (CPU,Reads,Writes and TotalDuration)

Wow , quite a stark difference.  But, a very big but, if range is what you require then that is what you have to do.

Now consider a different set of data.

drop table #orders
go

Select SalesPersonID,
min(OrderDate) as OrderMonth,
sum(TotalDue ) as TotalDue
into #Orders
from Sales.SalesOrderHeader SOH
where SOH.SalesPersonID is not null
group by SalesPersonID,datediff(mm,0,OrderDate);

go
create unique clustered index idxOrder on #Orders(SalesPersonId,OrderMonth);
go


declare @MaxSalesPersonId integer
Select @MaxSalesPersonId = max(SalesPersonId)
from #Orders

insert into #Orders
select SalesPersonID+@MaxSalesPersonId,OrderMonth,TotalDue
from #Orders
go 8

The same amount of rows overall but because there are now many more combinations of SalesPersonID and OrderMonth the difference in timings is now even more stark.

image

Also,  the outputs are now exactly the same.  In-fact notice how I have a unique index on SalesPersonID and OrderMonth so by definition the outputs are guaranteed to be the same, every ‘range’ will only ever have one row. 

From experience I would say that ROWS will be used in 99% of scenario’s and TBH i think that

A) BY ROWS should of been the default

B) The optimizer could be enhanced to ensure that if unique , then BY ROWS is used. 

More Posts Next page »