Be careful with table variables

I just looked into a poorly performing stored procedure and brought it down from over 4 minutes to under 4 seconds by making one simple change: converting a table variable into a temporary table. What's interesting is that there were only about 20,000 rows of about 1,100 bytes each in the table variable, which would be nothing in a permanent table. So what happened? I got stuck in and this is what I found.

The stored proc is for a sales report where the user can choose from a list of things to group the results by. The relevant sales are selected into a table variable with an empty column Group1, the approriate column is copied into Group1, and then there's a SELECT with a whole lot of aggregations GROUPed BY Group1. The query plan showed 99% of the cost was in the main insert into the table variable. Further reading revealed that no statistics are created on a table variable and so to the Query Optimizer it always has a cardinality of 1, whereas statistics are created on temporary tables. Was this the problem?

I changed it to use a temporary table instead, ran that, and compared the query plans and the output from the Profiler, including the events Auto Stats (under Performance) and SP:StmtStarting & SP:StmtCompleted (under Stored Procedures). The query plans for populating the temporary table and table variable were identical but the temporary table approach showed statistics being created on the temporary table before the final SELECT, and the SELECT itself was parallelized. The SP:StmtStarting and SP:StmtCompleted events in the Profiler confirmed that it was the final SELECT that took nearly all the time with a table variable.

Interestingly the query plan showed the final SELECT was 1% of the total cost with a table variable and 23% with a temporary table. It just shows, you have to be careful about looking at the steps in the plan and zooming in on the step that has the largest share of the total cost. I could have gone blind looking at the first step without ever improving it.

Comments

# Dave Ballantyne said:

As i understand it, the cost of batch percentage is an estimated cost.  These estimates are based on statistics.  Table variables have no statistics,  therefore it is unsurprising that the cost percentage of batch is wrong.  

14 July 2010 14:18
# DavidWimbush said:

Ah! Thanks, Dave. That would explain what I saw here. If it assumes the table variable will only have 1 row, it's a tiny part of the job to select it. With statistics showing 20,000 rows 23% looks more realistic.

14 July 2010 15:53
# Dave Ballantyne said:

Which is why table variables are really bad!  In your example the optimizer (probably) picked a nested loop as it more efficient to do that with 1 row. With 20,000 a hash or merge would be better, which (probably) happened with the temp table.  Maybe i should post a blog of my own on this.

14 July 2010 16:10
# DavidWimbush said:

Go for it! When I was looking into this I didn't think it was covered very well. The general impression was that table variables are better up to a certain size because they are 'lighter' somehow, although the details were vague.

My favourite was the claim that a table variable is better because it's in memory while a temporary table is on disk on tempdb, and disk is slower. In reality, neither would get anywhere near a disk unless memory pressure forced them out of the data cache, in which case both would go to disk in tempdb.

This statistics thing is key and, if it's a big table, so is the ability to add indexes to a temporary table. It seems to me a table variable is fine if it's a really small dataset but then so is a temporary table. So is there any case for using table variables at all?

15 July 2010 07:49
# Dave Ballantyne said:

Hmm , the only REAL positive is that they are non-logged.  As we are only talking about a small amount of rows , the overhead on the log itself will be minimal, but what this does mean is that the data persists after a ROLLBACK which can be advantageous.  Additionally, you cant use temp tables in functions BUT functions are the wrong thing in 99.9999% of circumstances.   Now on my blog todo list.

15 July 2010 12:30
# GrumpyOldDBA said:

I'd never use a table var for anything other than a small result set, as stated above they don't maintain stats - they will not generate parallel plans either, they are bad when joined to regular tables, they are bad when updated .. but it's always been that way and standard tuning for me is to always look at the volume of data in a table variable. The gain you achieved is not uncommon. The other myth around table vars is that they are always created in memory - this isn't true, but you used to gain from not having your proc recompile - this isn't such an issue in 2008.

16 July 2010 07:57
# DavidWimbush said:

We've nearly got a book chapter here!

16 July 2010 09:25
# Dave Ballantyne said:

Rather than blog, ive done a cuppa corner

www.screencast.com/.../72b48209-5715-4352-8a59-445824377e83

21 July 2010 09:34