TSQL Scalar functions are evil.
I’ve been
working with a number of clients recently who all have suffered at the hands of
TSQL Scalar functions. Scalar functions were introduced in SQL 2000 as a means
to wrap logic so we benefit from code reuse and simplify our queries. Who would
be daft enough not to think this was a good idea. I for one jumped on this
initially thinking it was a great thing to do.
However as
you might have gathered from the title scalar functions aren’t the nice friend
you may think they are.
If you are
running queries across large tables then this may explain why you are getting
poor performance.
In this post
we will look at a simple padding function, we will be creating large volumes to
emphasize the issue with scalar udfs.
create
function PadLeft(@val varchar(100), @len int, @char char(1))
returns
varchar(100)
as
begin
return
right(replicate(@char,@len) + @val, @len)
end
go
Interpreted
Scalar
functions are interpreted code that means EVERY call to the function results in
your code being interpreted. That means overhead for processing your function is
proportional to the number of rows.
Running this
code you will see that the native system calls take considerable less time than
the UDF calls. On my machine it takes 2614 ms for the system calls and 38758ms
for the UDF. Thats a 19x increase.
set
statistics time on
go
select
max(right(replicate('0',100) + o.name + c.name, 100))
from
msdb.sys.columns
o
cross
join msdb.sys.columns c
select
max(dbo.PadLeft(o.name + c.name, 100,'0'))
from
msdb.sys.columns
o
cross
join msdb.sys.columns
c
If you run
the last one again but with half the rows i.e. as follows the time is halved.
This highlights how linear the performance is in relation to the number of
rows
select
max(dbo.PadLeft(o.name + c.name, 100,'0')), count(1)
from
(select
top 50 percent
* from
msdb.sys.columns )o
cross
join msdb.sys.columns
c
Parallel
Scalar
functions are calculated on a single thread. This means that if you move to a
multi core machine your performance will not be increased. This is shown by
running against a query that results in parallel operators.
Create a very
large table and populate it.
create
table veryLargeTable
(col1 bigint, col2 bigint)--varchar(100))
go
declare
@i int
set
@i = 0
while
@i < 20
begin
insert into
veryLargeTable
select object_id, object_id
from sys.columns
set @i = @i + 1
end
Then they
perform the following queries (You
need to do this on a multi core machine),
set
statistics time on
go
select
max(right(replicate('0',100) + cast(v1.col1+ v2.col2 as varchar(100)), 100))
from
veryLargeTable v1
join
veryLargeTable v2 on v2.col1 = v1.col2
go
select
max(dbo.PadLeft(v1.col1+v2.col2, 100,'0'))
from
veryLargeTable v1
join
veryLargeTable v2 on v2.col2 = v1.col1
You will see
in the duration that the system function version takes x amount of time, but it
uses ~2X time in CPU. On my machine I get elapsed time of 3247ms and CPU time of
6094 ms. That highlights that the query was able to work in parallel. Whereas
the scalar UDF results in an elapsed time of 27041ms and an CPU time of 26000ms,
showing no parallelism.
Reduction of CTE and
views
Common table
expressions are largely syntactic constructs that are merged into the main
query, i.e. they perform like a view and not like a physical set of data.
However the use of scalar functions in CTEs can cause undesired behaviour. In
the worst case, a query might only return a few rows and you may believe that
the CTE is only evaluated a few times and thus your function only called a few
times, however due the optimiser might choose a query plan which results in the
CTE being evaluated for every row in the source dataset resulting in your
function being called many more times than you expect. This in conjunction with
the issues above results in very very poor performance.
Profiler
If you’ve
used scalar functions and tried to perform statement level profiling you will
have hit this last problem. Because each line in a scalar function is considered
a statement then EVERY time the function is evaluated, EACH line in the function
is recorded in profiler.
This firstly
means you can’t generally find the statements you are really concerned with, but
of more a concern is that the performance hit of capturing ALL these statements
if very very large.
To test this
set up profiler to record SP:StmtCompleted and run the samples above, you will
soon be overwhelmed with data.
There is a
workaround for this which reduces the impact but it only reduces the impact. The
workaround is to filter out statements where the statement is for an object of
type “Scalar function”
ObjectType
<> 20038
Options
So what are
your options,
1.
You either need to put your code in
line, but then you lose the benefits of code reuse.
2.
Write a CLR function
3.
Write a table valued function and use a
subquery.
|
Option |
Code
Reuse |
Performance |
Extensible |
Parallelism |
|
System
functions |
No |
Very High |
Limited (single line) |
Yes |
|
CLR
function |
Yes |
High |
Yes |
Yes |
|
Table Valued
function |
Yes |
High |
Limited (single line) |
Yes |
|
Scalar
function |
Yes |
Low |
Yes |
No |
The Table
Valued Function option is the one that is not commonly known, but is the best
option if you can convert your function into a single line of system function
calls and don’t want to go to CLR. What you do is write a Table Valued function.
So how does
that actually work. You create your table valued function to return 1 row. In
your query you then use a subquery to get the value from the table valued
function.
create
function PadLeftTVF(@val varchar(100), @len int, @char char(1))
returns
table
as
return
(select
right(replicate(@char,@len) + @val, @len) val)
go
select
max(val)
from
(select (select
* from
PadLeftTVF(o.name + c.name, 100,'0')) val
from msdb.sys.columns
o
cross
join msdb.sys.columns c) d
select
max(val)
from
(select
(select
val from PadLeftTVF(v1.col1+ v2.col2 , 100,'0'))
val
from
veryLargeTable v1
join
veryLargeTable v2 on v2.col1 = v1.col2 )
d
Note: The
derived table is used because you can’t use a subquery in an aggregate. But if
you do it via a derived table it works. Why I have no idea and it seems odd that
the optimiser can’t do this for me. Anyway the Max is so we only return 1 row
from the query and so our timing isn’t affected by the time it takes to return
and render the data.
Whilst this
looks very complicated this benefits from the fact that the optimiser can
effectively consume the system calls in line, because it can figure out there is
only going to be own row and column returned. You can see this by looking at the
properties of the streaming aggregate. The table valued function will have an
output of something like where the function has been totally reduced to system
calls.
MAX(right('0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'+CONVERT_IMPLICIT(varchar(100),[msdb].[sys].[syscolpars].[name]+[msdb].[sys].[syscolpars].[name],0),(100)))
Where as the
scalar one has the function call to PadLeft.
MAX([tempdb].[dbo].[PadLeft](CONVERT_IMPLICIT(varchar(100),[msdb].[sys].[syscolpars].[name]+[msdb].[sys].[syscolpars].[name],0),(100),'0')))
Use of Table
Valued functions means that using this method you can benefit from parallelism
because the function has been reduced to system function calls and so your query
will scale better when you go to a multi core machine. This is highlighted if
you run the last query above on a multi core machine.
Summary
If you aren’t
into writing C# code and you want ultimate performance and code reuse then write
your scalar udfs as table
valued functions and use a subquery
Other
References
http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx
-