September 2011 - Posts

Table Variables and Parallel Plans

I’m sure somewhere there must be carved into the side of a mountain those immortal words “ Well it seemed a good idea at the time!” ( with due homage to HHGTTG )

Hopefully there’s nothing in this post which you the reader will not know about already? However, I thought it would be interesting to recount the steps and findings from tuning a small stored procedure in one of our production systems as it illustrates how simple changes can improve performance and how two aspects of tuning can interact to degrade performance.  As you will probably have guessed from the title it’s Table Variables and ( the lack of ) Parallelism.

I’ve been pretty sharp over the years with anyone who has blogged about turning off parallelism, apart from a couple of isolated applications I have never turned off parallelism ( even with oltp type applications ).It’s quite difficult I think to place every application into either an oltp or dss silo, and as I’ve frequently said it’s no point having 80 cores on your server if you’re going to use it single threaded.

I can’t publish the code for this procedure, I don’t think it would make much sense anyway after I’ve tried to obscure what it does, but I will describe what it did. ( All the stats are from the output of profiler, the tests were run on a 16 core server, SQL 2008 Enterprise, all results are from in cache objects and data. )

Step 1. Insert into a table variable 3 integer values from a 3 table join with two not exists in the where clause which join to the source table(s).

Step 2. In a loop insert into the table variable based upon a join to another table ( 60 times only )

Step 3. Select the table variable joined to another table,grouped by one column value.

The output is about 80 rows of 2 integer values.

The procedure is called from another procedure as part of a bigger process. I was drawn to the calling procedure because it was slow running, up to 25 seconds.

The main procedure actually calls several other stored procedures, including this one, so I set a profiler to capture the duration and io of each statement, this showed that in my test calls to this procedure were responsible for almost all of the duration, but only 30% of the io.

After extracting some sets of typical parameters for this call I set about looking to see what the problem was.

My starting call ( a typical data set ) showed 9,673 cpu cycles, 266,000 page reads and a duration of just over 10 seconds.

The underlying tables are not vast, 4.5 and 17 million rows for the largest. There were no table scans and on the whole the queries were not too bad , these tables are quite wide and selections were generally on keys and foreign keys – I did note a couple of bookmark lookups.

One point to note here is that SSMS did not suggest any missing indexes which might improve performance.

Now table variables can be a pain, true they stop issues of recompilation, not so much an issue in SQL 2008, however just to recap, they don’t carry stats and they will not carry a parallel plan. Common myth is that table variables create in memory whilst #tables create on disk, it’s also a myth that putting a Primary Key on a table variable makes things better, it doesn’t, it will enforce uniqueness but there is only one operation on a table variable and it’s a full table scan.

My view is that you never want to join table variables to permanent tables as plans will most likely be sub optimal, I also don’t have much time for updates to table variables which use joins.

So first step ditch the table variable and put in a #table, here’s the result.


  cpu Page Reads Duration ( secs )

Table Variable









As you can see just replacing the table variable brings significant performance gains. I should point out that the results are a typical set which are representative, it’s not a worst and best.

At this point you might think well that’s fixed, what next? Aha not quite. Now freed from the table variable I had a parallel plan and a missing index suggested by SSMS. I checked out the suggestion and it was correct so I added it and went back to the plan – another index was suggested so I also added this. Results now for the two versions of the proc:

With first missing index

  cpu Page reads Duration ( secs )

Table Variable








With second missing index

  cpu Page reads Duration ( secs )

Table Variable








Checking the query plan I noted that the looping was performing a bookmark lookup, this only required an index on one column with an include on a second column, both integers. Adding this index gave me this result

  cpu Page reads Duration ( secs )

Table Variable








The initial population of the #table was generating a parallel plan, I have max degree of parallelism set to 8 on the server ( 16 cores ) so the next step was to see if if I could get any more performance by changing the level of parallelism

Maxdop Duration ( secs )











Now I had the show execution plan checked during testing, it’s interesting to be able to note that there is indeed an overhead, which is clear in these results.


Maxdop Duration ( secs )











However the answer in this case is that a definitive forced setting of 8 will bring the biggest benefit.

NB. by “auto” I mean the default setting of 8 on the server configuration.

I should point out that generally I attempt to avoid using direct hints as a tuning tool, it adds an extra level of complexity that requires constant checking for every service pack, the maxdop doesn’t map for our dev and test environments which don’t have as many cores, so in general it could lead to problems later. That said this procedure is key to a particular performance issue.

Final Note:  I don’t mean by this that all table variables should be replaced with #tables, however in a number of cases where table variables feature in say 4 table joins with proper tables, making the table variable a #table has given significant performance gains for me in application tuning. Ultimately if I can shave 10 or 20 seconds from an execution time then to me that’s a significant gain.

Posted by GrumpyOldDBA with no comments

Posts of Interest

I don't make a habit of just posting links as I think it's like a form of cheating, however sometimes there are things I want to keep track of and I know If I put the link on my blog I'll find it again!

so: a short one here;  We've just bought some new servers with 10 core processors, or 20 if you turn on hyperthreading. That's 80 cores/threads which is sort of a waste if you're not going to make use of parallel plans, however it does make a backup fast!  I saw this post form the CSS team and thought - hmmm I must remember this   btw I think he means cores not cpus.

Now for an interesting one indeed: I've always been one to want to designate indexes/keys descending when tables have incremental counters as a key - or the key on a secondary index is a date and I know we usually pick the latest dates, as an example.  So my concern has always been that if I want the latest rows from a table I should have descending indexes; I seem to recollect somewhere in my distant past that this was actually important on a RDBMS ( not sql server ). So it's something i've been doing on and off for many years - the problem being that if asked to justify why i want to make an index descending could I prove that it made a difference?

Well earlier this year I attempted to prove that it did in fact make a difference ( sql 2008 and sql 2008 R2 ) but my attempts failed and other than it just making a select top xxx  always return the latest rows I wasn't able to show a difference.  Well today I found out how poor my attempts to show this were < grin >  and I bow down to Fabiano Amorim who has an article in simple-talk  which illustrates what I was trying to achieve.

Things you overhear !!

"You can't get better than 90% can you?"

" we need to disaggregate down the data"