TSQL Scalar functions are evil.

I’ve been working with a number of clients recently who all have suffered at the hands of TSQL Scalar functions. Scalar functions were introduced in SQL 2000 as a means to wrap logic so we benefit from code reuse and simplify our queries. Who would be daft enough not to think this was a good idea. I for one jumped on this initially thinking it was a great thing to do.

However as you might have gathered from the title scalar functions aren’t the nice friend you may think they are.

If you are running queries across large tables then this may explain why you are getting poor performance.

In this post we will look at a simple padding function, we will be creating large volumes to emphasize the issue with scalar udfs.

create function PadLeft(@val varchar(100), @len int, @char char(1))

returns varchar(100)



  return right(replicate(@char,@len) + @val, @len)




Scalar functions are interpreted code that means EVERY call to the function results in your code being interpreted. That means overhead for processing your function is proportional to the number of rows.

Running this code you will see that the native system calls take considerable less time than the UDF calls. On my machine it takes 2614 ms for the system calls and 38758ms for the UDF. Thats a 19x increase.

set statistics time on


select max(right(replicate('0',100) + o.name + c.name, 100))

from msdb.sys.columns o

cross join msdb.sys.columns c


select max(dbo.PadLeft(o.name + c.name, 100,'0'))

from msdb.sys.columns o

 cross join msdb.sys.columns c

If you run the last one again but with half the rows i.e. as follows the time is halved. This highlights how linear the performance is in relation to the number of rows

select max(dbo.PadLeft(o.name + c.name, 100,'0')), count(1)

from (select top 50 percent * from msdb.sys.columns )o

 cross join msdb.sys.columns c


Scalar functions are calculated on a single thread. This means that if you move to a multi core machine your performance will not be increased. This is shown by running against a query that results in parallel operators.

Create a very large table and populate it.

create table veryLargeTable (col1 bigint,  col2 bigint)--varchar(100))


declare @i int

set @i = 0

while @i < 20


    insert into veryLargeTable

    select object_id, object_id

    from sys.columns

    set @i = @i + 1



Then they perform the following queries  (You need to do this on a multi core machine),

set statistics time on


select max(right(replicate('0',100) + cast(v1.col1+ v2.col2 as varchar(100)), 100))

from veryLargeTable v1

join veryLargeTable v2 on v2.col1 = v1.col2


select max(dbo.PadLeft(v1.col1+v2.col2, 100,'0'))

from veryLargeTable v1

join veryLargeTable v2 on v2.col2 = v1.col1


You will see in the duration that the system function version takes x amount of time, but it uses ~2X time in CPU. On my machine I get elapsed time of 3247ms and CPU time of 6094 ms. That highlights that the query was able to work in parallel. Whereas the scalar UDF results in an elapsed time of 27041ms and an CPU time of 26000ms, showing no parallelism.

Reduction of CTE and views

Common table expressions are largely syntactic constructs that are merged into the main query, i.e. they perform like a view and not like a physical set of data. However the use of scalar functions in CTEs can cause undesired behaviour. In the worst case, a query might only return a few rows and you may believe that the CTE is only evaluated a few times and thus your function only called a few times, however due the optimiser might choose a query plan which results in the CTE being evaluated for every row in the source dataset resulting in your function being called many more times than you expect. This in conjunction with the issues above results in very very poor performance.


If you’ve used scalar functions and tried to perform statement level profiling you will have hit this last problem. Because each line in a scalar function is considered a statement then EVERY time the function is evaluated, EACH line in the function is recorded in profiler.

This firstly means you can’t generally find the statements you are really concerned with, but of more a concern is that the performance hit of capturing ALL these statements if very very large.

To test this set up profiler to record SP:StmtCompleted and run the samples above, you will soon be overwhelmed with data.

There is a workaround for this which reduces the impact but it only reduces the impact. The workaround is to filter out statements where the statement is for an object of type “Scalar function”

ObjectType <> 20038


So what are your options,

1.       You either need to put your code in line, but then you lose the benefits of code reuse.

2.       Write a CLR function

3.       Write a table valued function and use a subquery.



Code Reuse




System functions


Very High

Limited (single line)


CLR function





Table Valued function



Limited (single line)


Scalar function






The Table Valued Function option is the one that is not commonly known, but is the best option if you can convert your function into a single line of system function calls and don’t want to go to CLR. What you do is write a Table Valued function.

So how does that actually work. You create your table valued function to return 1 row. In your query you then use a subquery to get the value from the table valued function.

create function PadLeftTVF(@val varchar(100), @len int, @char char(1))

returns table


  return (select right(replicate(@char,@len) + @val, @len) val)


select max(val)

from (select  (select * from PadLeftTVF(o.name + c.name, 100,'0')) val

        from msdb.sys.columns o

  cross join msdb.sys.columns c) d

select max(val)

from (select (select val from PadLeftTVF(v1.col1+ v2.col2 , 100,'0')) val

from veryLargeTable v1

join veryLargeTable v2 on v2.col1 = v1.col2 ) d


Note: The derived table is used because you can’t use a subquery in an aggregate. But if you do it via a derived table it works. Why I have no idea and it seems odd that the optimiser can’t do this for me. Anyway the Max is so we only return 1 row from the query and so our timing isn’t affected by the time it takes to return and render the data.

Whilst this looks very complicated this benefits from the fact that the optimiser can effectively consume the system calls in line, because it can figure out there is only going to be own row and column returned. You can see this by looking at the properties of the streaming aggregate. The table valued function will have an output of something like where the function has been totally reduced to system calls.


Where as the scalar one has the function call to PadLeft.


Use of Table Valued functions means that using this method you can benefit from parallelism because the function has been reduced to system function calls and so your query will scale better when you go to a multi core machine. This is highlighted if you run the last query above on a multi core machine.


If you aren’t into writing C# code and you want ultimate performance and code reuse then write your scalar udfs as table valued functions and use a subquery

Other References



Published Monday, November 3, 2008 9:48 PM by simonsabin


Wednesday, November 5, 2008 5:12 PM by Andy Irving

# re: TSQL Scalar functions are evil.

This only works for *inline* TVFs (perhaps not obviously to some, but Adam mentions it in his blog that you've linked to).

So unfortunately if your scalar UDF contains any (awful) top 1... order by comedy, you're out of luck

Wednesday, November 5, 2008 5:32 PM by simonsabin

# re: TSQL Scalar functions are evil.

Sort of true. You would still be better to write that as an inline table valued function so that the optimiser can merge it with the rest of the query.

The issue is if you have to do multiple statements i.e. an iterative calculation. In this case you should in my view resort to a CLR function

Sunday, November 9, 2008 4:42 PM by facility9 » Links for the week

# facility9 &raquo; Links for the week

Pingback from  facility9 &raquo; Links for the week

Monday, November 10, 2008 12:10 AM by Weekly Link Post 67 « Rhonda Tipton’s WebLog

# Weekly Link Post 67 &laquo; Rhonda Tipton&#8217;s WebLog

Pingback from  Weekly Link Post 67 &laquo; Rhonda Tipton&#8217;s WebLog

Thursday, March 26, 2009 4:01 PM by sstange

# re: TSQL Scalar functions are evil.

Scalar functions are truely evil. We had a scalar function that simply returns the year. So, for example, we might have a select statement like:

select sum(units) as FYSales from sales where FY = dbo.GetCY();

Everything worked fine in SQL Server 2005, but as soon as we upgraded to SQL Server 2008, performance was awful!!!! This same query in 2005 would take < 1 sec, but in 2008 we're running at 13 minutes.

Friday, August 7, 2009 1:20 PM by SimonS Blog on SQL Server Stuff

# TSQL Challenge - counting non null columns

I&#39;m working on a project where I need to cycle a flag amongst a set of columns. To achieve this I

Friday, August 7, 2009 5:31 PM by SimonS Blog on SQL Server Stuff

# TSQL Challenge - counting non zero columns

I&#39;m working on a project where I need to cycle a flag amongst a set of columns. To achieve this I am storing a position value in each column which allows me to cycle them. In one process I need to know how many columns are non zero. So the challenge

Tuesday, November 16, 2010 1:25 PM by Storing Single Key/Value Settings

# Storing Single Key/Value Settings

Pingback from  Storing Single Key/Value Settings

# Spread the Word! SQL User-Defined Scalar Functions are Evil! &laquo; Conficient Blog

Pingback from  Spread the Word! SQL User-Defined Scalar Functions are Evil! &laquo; Conficient Blog

# Inline Functions &#8211; Are they poor coding practice? - Programmers Goodies

Pingback from  Inline Functions &#8211; Are they poor coding practice? - Programmers Goodies