SQL and the like

Output = MAXDOP 1

It is widely know that data modifications on table variables do not support parallelism, Peter Larsson has a good example of that here .  Whilst tracking down a performance issue,  I saw that using the OUTPUT clause also causes parallelism to not be used.

By way of example,  first lets create two tables with a simple parent and child (one to one) relationship, and then populate them with 100,000 rows.

Drop table Parent
Drop table Child
go
create table
Parent
(
id integer identity Primary Key,
data1 char(255)
)

Create Table Child
(
id integer Primary Key
)
go
insert into
Parent(data1)
Select top 1000000 NULL
from sys.columns a cross join sys.columns b

insert into Child
Select id from Parent
go

If we then execute

update Parent 
set data1 =''
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1

We should see an execution plan using parallelism such as

Parrelel1

 

However,  if the OUTPUT clause is now used

update Parent 
set data1 =''
output inserted.id
from Parent
join Child on Parent.Id = Child.Id
where Parent.Id %100 =1
and Child.id %100 =1

 

The execution plan shows that Parallelism was not used

Parrelel2

Make of that what you will, but i thought that this was a pretty unexpected outcome.

 

Update : Laurence Hoff has mailed me to note that when the OUTPUT results are captured to a temporary table using the INTO clause,  then parallelism is used.  Naturally if you use a table variable then there is still no parallelism  

Cleaning up sys.dm_exec_cached_plans

Following on from my previous post in which I noticed that SQLServer itself was artificially bloating sys.dm_exec_cached_plans, the next obvious question is “How can i clear this rubbish out ?”.  On 2008 its simple enough, Microsoft have kindly enhanced DBCC FREEPROCCACHE to accept a plan_handle.  On 2005 things are a bit more tricky.  You could use DBCC FREEPROCCACHE , but that would clear everything out, certainly not a good thing to be happening in a live environment.  It has been blogged about before that sp_create_plan_guide can be used to purge a specific statement.  That works fine on single statement batches , but multi-statement  batches are not supported as easily.  The solution is in multi-statement batches to create a separate plan for each statement.  Here’s my rough-and-ready routine to clear down sys.dm_exec_cached_plans for msdb statements.

declare @Text nvarchar(max)
Declare @Plan_handle varbinary(64)

declare purgecur cursor for
 SELECT text,plan_handle
   FROM sys.dm_exec_cached_plans 
  CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
where Objtype = 'Adhoc'
   and text like '%msdb.%'
--   and plan_handle = 0x06000400E3854A1E40030F8E010000000000000000000000
open purgeCur
while(0=0) begin 
   Fetch Next from PurgeCur into @Text,@plan_handle
   
   if(@@Fetch_Status <> 0) break

   declare @CurCount integer
   Select @CurCount =0
   declare @StmtText nvarchar(max)
   declare purgestmt cursor
       for SELECT substring(text,(qs.statement_start_offset+2)/2,
                  (((case when statement_end_offset=-1 then 999998 else statement_end_offset end)-statement_start_offset)+2)/2)
             FROM sys.dm_exec_query_stats AS qs
                  CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st
            where qs.plan_handle = @Plan_handle
   open purgestmt
   
   while(0=0) begin 
       fetch next from purgestmt into @StmtText
       
       if(@@Fetch_status<>0) break
       select @CurCount =@Curcount+1
       begin try
       exec sp_create_plan_guide 
                   @name = N'PlanGuidePurge', 
                   @stmt = @StmtText,
                   @type = N'SQL', 
                   @module_or_batch = @Text, 
                   @params = NULL, 
                   @hints = N'OPTION (MaxDop 1)' 
       end try
       begin catch
       end catch           
       if exists(Select * from sys.plan_guides where name = N'PlanGuidePurge') begin    
          exec sp_control_plan_guide N'DROP', N'PlanGuidePurge'
       end
   end
   close purgestmt
   deallocate purgestmt
   if(@CurCount =0) begin 
       begin try
       exec sp_create_plan_guide 
                   @name = N'PlanGuidePurge', 
                   @stmt = @Text,
                   @type = N'SQL', 
                   @module_or_batch = @Text, 
                   @params = NULL, 
                   @hints = N'OPTION (MaxDop 1)' 
       end try
       begin catch
       end catch           
       if exists(Select * from sys.plan_guides where name = N'PlanGuidePurge') begin    
          exec sp_control_plan_guide N'DROP', N'PlanGuidePurge'
       end
    end
   
end
Close PurgeCur
Deallocate PurgeCur
Microsoft – Follow best practices – Part 2

In addition to my previous post, another best practice is to not use NOLOCK and READ UNCOMMITTED transaction isolation level.

Here’s an excerpt from a profiler trace

BEGIN TRAN UpdateMediaTables

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

DECLARE @BackupSetId int, @MediaSetId int, @LogDevName varchar(512)

SELECT @MediaSetId = media_set_id
FROM msdb..backupmediafamily AS bmf WITH (NOLOCK)
WHERE substring(bmf.physical_device_name,5,36) = '80A2E6DE-3E95-4645-B476-09E37306FF8C'

SELECT @BackupSetId = backup_set_id
FROM msdb..backupset WITH (NOLOCK)
WHERE media_set_id = @MediaSetId

So, not only do we have no consistency, but also a non–SARGable lookup.

I’ve updated my previous connect item to reflect this also.

Microsoft – Follow best practices!

Best practice is to use parametrized queries to enable plan reuse.  Will someone please tell Microsoft this. 

Presently dm_exec_cached_plans on our live server has been bloated by 550mb of adhoc queries by SqlAgent (possibly a few other services)

Here’s how im calculating the total bloat value

SELECT sum(size_in_bytes)
FROM sys.dm_exec_cached_plans 
CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
where Objtype = 'Adhoc'
  and text like '%msdb.%'

 

Which can be broken down to a query by query basis of

SELECT sum(size_in_bytes),count(*),substring(text,1,100)
FROM sys.dm_exec_cached_plans 
CROSS APPLY sys.dm_exec_sql_text(plan_handle) 
where Objtype = 'Adhoc'
  and text like '%msdb.%'
group by substring(text,1,100) order by 1 desc

Connect item here if you feel like voting.

Age calculation with SQL Server

There seem to be many different methods being suggested to calculate an age in SQLServer.  Some are quite complex but most are simply wrong.  This is by far the simplest and accurate method that I know.

 

Declare @Date1 datetime
Declare
@Date2 datetime


Select
@Date1 = '15Feb1971'
Select @Date2 = '08Dec2009'
select CASE
WHEN
dateadd(year, datediff (year, @Date1, @Date2), @Date1) > @Date2
THEN datediff (year, @Date1, @Date2) - 1
ELSE datediff (year, @Date1, @Date2)
END as Age

This even copes with the tricky situation of 29th feb,  although I cant say correctly as according to Wikipedia birthdays may be on the 28th of Feb or 1st March.

This post has been part of T-SQL Tuesday, hosted this month by Adam Machanic

Phil Factor Speed Phreak SQL Challenge Number 3

The latest Phil Factor challenge is now active here. The prize has now been increased to $100

My first useful powershell script – capturing performance counters

After playing around with powershell a bit, I managed to do something quite useful.  There are a multitude of ways to capture performance counters but I think this will have the advantage of being able to be fired by SqlAgent (or another timer process) every X many seconds.  All you now need to do is process the data within Excel (or power pivot as shown by David Castro  here ).  For every server in servers.txt it will collect the counters in counters.txt

$Servers = get-content c:\servers.txt
$CounterList = Get-Content c:\counters.txt


$sw = new-object system.IO.StreamWriter("c:\perf.res",1)
$Counters = $CounterList | Get-Counter -computer $Servers
foreach($counter in $counters){
$counter.ToString()
foreach($sampleset in $counter.CounterSamples){
$sw.writeline($sampleset.Timestamp.ToString()+','+$sampleset.Path + ',' +$sampleset.CookedValue )
}
}

$sw.close()

 

Example Counters.Txt

\Memory\Available MBytes
\Paging File(_total)\% Usage
\PhysicalDisk(_total)\% Disk Time
\PhysicalDisk(_total)\Avg. Disk Bytes/Read
\PhysicalDisk(_total)\Avg. Disk Bytes/Write
\PhysicalDisk(_total)\Disk Reads/Sec
\PhysicalDisk(_total)\Disk Writes/Sec
\SqlServer:Buffer Manager\Buffer cache hit ratio
\SqlServer:Buffer Manager\Page life expectancy
\SqlServer:General Statistics\User Connections
\SqlServer:Memory Manager\Memory Grants Pending
\System\Processor Queue Length

Example Servers.Txt

Server1
Server2

Enjoy

The Observer Effect In Action

I’ve put my hand up to to a quick 10 - 15 minute slot at the London user group, so I was getting my scripts together to do a presentation based on my UDF Overhead blog entry.  Naturally this being my first time talking, I wanted to make sure that I was accurate in terms of the statement timings and how I was interpreting the results.  After running the examples a few times I started to notice a discrepancy that I was wasn't expecting.

Executing the UDF with ‘SET STATISTICS TIME ON’ was visibly slower that without. In fact in profiler the duration time with stats time on was 2056ms, without it duration time was 233ms. This was the only difference. Testing on 2008 showed the same effect but to a lesser degree.

Heres the script im testing with.

Drop Function GetSalesCommission
go
Create Function GetSalesCommission(@SalesAmount money)
returns money
as
begin 
    Declare @CommissionAmount money
    Select @CommissionAmount = (@SalesAmount/100.0) * 5
    return @CommissionAmount
end
go
SET STATISTICS TIME ON
select sum(dbo.GetSalesCommission(SubTotal))  as Commission
  from Sales.SalesOrderHeader
go
SET STATISTICS TIME OFF
select sum(dbo.GetSalesCommission(SubTotal))  as Commission
  from Sales.SalesOrderHeader
go

And a screen shot of the profiler output

statstime

With this in mind,I’m certainly going to re-asses a few performance evaluation practices.  It doesn't completely negate my previous post on UDF Overheads but a large portion of the timings would seem to be related to this.

And the winner is….

Me.  Ok, so there wasn't a massive field of runners and riders, in the second Phil Factor challenge.  But I'm still feeling pretty chuffed.

PhilFactorSQLSpeedPhreakAward

Another T-SQL Challenge

Phil Factor has posted up another SQL challenge.  Its a goodie and a $60 amazon voucher is up for grabs. 

Give it a try here

UDF Overhead – A simple example

When Microsoft first announced that in SQLServer 2000 they were introducing user defined functions,  I thought “Excellent, that will really help my system”.  However,  it soon became apparent that there is a big overhead in calling them.  Here’s a really simple example of how to waste some system resources. 

Here is a simple udf

Create function Sales.CalcCommission(@Price Money)
returns money
as
begin
Declare
@Commission money
Select
@Commission = (@Price/100.000)*5
return @Commission
end

Nothing special there, it just calculates a sale persons commission at a rate of 5%.  So lets apply that to the whole of the SalesOrderDetail table.

select UnitPrice,Sales.calcCommission(UnitPrice)  From Sales.SalesOrderDetail;

Job done,  report written, push the code live and even better you have a function that can be re-used in different pieces of code many times over.  But what cost has that come as ?  Using ‘SET STATISTICS TIME’ is pretty graphic.

select UnitPrice,(UnitPrice/100.000)*5  From Sales.SalesOrderDetail
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 784 ms.
select UnitPrice,Sales.calcCommission(UnitPrice)  From Sales.SalesOrderDetail;
SQL Server Execution Times:
CPU time = 1625 ms, elapsed time = 1795 ms.

Yes, its takes over twice as long using the udf as not,  and just look at the CPU time.  So what are your alternatives ?  An inline function ?

Create Function Sales.InlineCalcCommission(@Price Money)
returns table
as
return select
(@Price/100.000)*5 as Commission;
select UnitPrice,Commission.Commission
From Sales.SalesOrderDetail cross apply Sales.InlineCalcCommission(UnitPrice) as Commission
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 984 ms.

Which is much better,  or my own personal favourite (call me old fashioned)

Create View SalesWithCommission
as
select
UnitPrice,(UnitPrice/100.000)*5 From Sales.SalesOrderDetail
Add this to the fact that the execution costs of functions are hidden in an execution plan,  this is why they are my penultimate port of call.  The last being a cursor.

Update 24Nov2009 : Please see my follow up blog post at http://sqlblogcasts.com/blogs/sqlandthelike/archive/2009/11/24/the-observer-effect-in-action.aspx

Returning a chain of events with a recursive CTE
This is a question that cropped up SQLServerCentral.
Imagine you have a list of dates, for which you need to know the next day after a specified interval. 

So in this list you want to return those with an interval of 3Days (ie those in Red)
01Jan
02Jan
07jan
11Jan
12Jan
28jan
 
To add further complication there are also multiple ranges.  Heres some sample data
drop table #DateSteps
go
Create table
#DateSteps
(
Handler integer,
EventDate smalldatetime
)
go
Create unique index
idxDateSteps on #DateSteps(handler,EventDate)
go
insert into
#dateSteps values(1,'01jan1990')
insert into #dateSteps values(1,'07jan1990')
insert into #dateSteps values(1,'11jan1990')
insert into #dateSteps values(1,'12jan1990')
insert into #dateSteps values(1,'28jan1990')
insert into #dateSteps values(1,'08mar1990')
insert into #dateSteps values(2,'02jan1990')
insert into #dateSteps values(2,'04jan1990')
insert into #dateSteps values(2,'10jan1990')
insert into #dateSteps values(2,'12jan1990')
insert into #dateSteps values(2,'15mar1990')

The key to this routine is the use of row_number() within the derived table at the recursive element of the CTE.  This enables us to return only the next row that is three days or more after the input date.
with cteStartDates(Handler,EventDate)
as
(
Select Handler,
min(EventDate)
from #datesteps
group by Handler
),
CteRecursiveSkip(Handler,EventDate)
as
(
Select Handler,EventDate
from cteStartDates
union all
Select x.Handler,x.EventDate
from (
Select #datesteps.Handler,#datesteps.EventDate,
RowNum = row_number()
over (partition by #datesteps.Handler
order by #datesteps.EventDate)
from CteRecursiveSkip
join #datesteps
on #datesteps.Handler = CteRecursiveSkip.Handler
and #datesteps.Eventdate >= CteRecursiveSkip.EventDate+3
) as x
where x.RowNum = 1
)
Select *
from CteRecursiveSkip
order by Handler,EventDate
Yet another cursor resigned to the garbage can.  Performance should also be quite good across large datasets.
DateTime Lookups

Heres a little something that Simon Sabin mentioned at the London PASS user group last night that I wasnt aware of.

Ive lost count of the amount of queries I have that are some derivative of  >= Midnight of a day and < Midnight of day +1. In SQLServer 2008 the engine now does the work for you.

select *
from Sales.SalesOrderHeader
where CAST(OrderDate as DATE) ='20010701'

No great surprises so far, but internally the engine has transformed it into a BETWEEN query. So, if you add an index

create index idxSalesOrderHeaderDate 
on Sales.SalesOrderHeader(OrderDate)
include (SalesOrderId)

and then look at the query plan …

sohidx

Fabulous, an index seek. How simple is that ?

BUG - Use of ranking functions result in an inefficient query plan

In SQL2005 (9.00.4207.00), if you use a ranking function within a view or CTE, then an inefficient query plan can be produced.

First off in Adventure works create this index

Create Index idxLastName on Person.Contact(LastName)

and then consider these similar queries:

select ContactId,
LastName,
row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
where LastName = 'Smith'
go
create view vwContacts
as
select
ContactId,
LastName,
rown = row_number() over(partition by LastName order by ContactId)
from [AdventureWorks].[Person].[Contact]
go
Select
* from vwContacts
where LastName = 'Smith'
go

You should now see two vastly different query plans


rankingqueryplan1


Notice in “Query 2” that Filter ?


image


Wow.. So it did calculate the entire query and then filter on the results.  At first i thought that i was being unfair, obviously there is potential for filtering on the ranking function, but when doing exactly the same on 2008 I get index seeks on both sides


rankingqueryplan2


CTE’s also suffer from the same problem, although they are easier to rewrite than a view to utilize a seek.

with cteLastNameRow(ContactId,LastName,Row_number)

as(
   
select ContactId,
          
LastName,
          
row_number() over(partition by LastName order by ContactId)
   
from [AdventureWorks].[Person].[Contact]
    )
select *
 
from cteLastNameRow
where LastName = 'Smith'
go

Another index scan.

So , be careful when using ranking functions and as ever,  investigate the query execution plan on EVERY query. I've raised a connect bug report here if you feel that it is something that Microsoft should invest some time in fixing.

Direct EMailing of SSRS Reports via SQLCLR
This is something that I was quite surprised that wasn't supported out of the tin with SSRS. When you schedule an email delivery, the report is always sent as an attachment. What would be great, if when you specified the render format as "MHTML", you had the option that the report forms the body of the email. Taking that one step on, how about calling that directly from SQLServer ?

Some of which is to follow has been shamelessly ripped from here

Source Code here
Useage :
exec SSRSMail @retval output,   --1 On Success , 99 on Fail
@ExceptionString output, -- ErrorMessage
@SSRSReport, -- Report name including forward slashes '/Marketing/SalesFigures'
@Params, -- Comma Delimited list of parameters to pass to the report
-- Case Sensitive ie 'SalesMonth = 3 , SalesYear = 2009 , SalesManager = John'
@RecipientList -- Comma Delimited list of emailRecipients ,
@SenderEmail, -- The Sender address. Supports a friendly name ie '<Company Reports>Reports@YourServer.com'
@Subject , -- Email Subject
@CCList, -- Comma Delimited CCList
@BCCList -- Comma Delimited BCCList




To use this create a new "SQL Server Project" within visual studio , and add a web reference to your SSRS Server http://yourserver/ReportServer/ReportExecution2005.asmx?wsdl naming it "ReportExecution".

Inside the source code rename "MailServer" to your mailserver. Please note that as i dont logon to that server that it needs SMTP-relaying enabled. The reason i dont use the system.net.mail namespace is that this routine was originally used within SSIS, but within that you cant specify the sender address. If you dissaprove, feel free to change to using that.

So the routine can consume the webservice you need to serialize the dll using the "sgen.exe" utility. More info on that here

We've been running live with this now for a few weeks , sending out a good few hundred emails a day to our clients , so stability seems good :). Ive had no complaints either about incorrect formatting. If you get any problems with that then let me know.
More Posts Next page »