Table Variables and Parallel Plans

I’m sure somewhere there must be carved into the side of a mountain those immortal words “ Well it seemed a good idea at the time!” ( with due homage to HHGTTG )

Hopefully there’s nothing in this post which you the reader will not know about already? However, I thought it would be interesting to recount the steps and findings from tuning a small stored procedure in one of our production systems as it illustrates how simple changes can improve performance and how two aspects of tuning can interact to degrade performance.  As you will probably have guessed from the title it’s Table Variables and ( the lack of ) Parallelism.

I’ve been pretty sharp over the years with anyone who has blogged about turning off parallelism, apart from a couple of isolated applications I have never turned off parallelism ( even with oltp type applications ).It’s quite difficult I think to place every application into either an oltp or dss silo, and as I’ve frequently said it’s no point having 80 cores on your server if you’re going to use it single threaded.

I can’t publish the code for this procedure, I don’t think it would make much sense anyway after I’ve tried to obscure what it does, but I will describe what it did. ( All the stats are from the output of profiler, the tests were run on a 16 core server, SQL 2008 Enterprise, all results are from in cache objects and data. )

Step 1. Insert into a table variable 3 integer values from a 3 table join with two not exists in the where clause which join to the source table(s).

Step 2. In a loop insert into the table variable based upon a join to another table ( 60 times only )

Step 3. Select the table variable joined to another table,grouped by one column value.

The output is about 80 rows of 2 integer values.

The procedure is called from another procedure as part of a bigger process. I was drawn to the calling procedure because it was slow running, up to 25 seconds.

The main procedure actually calls several other stored procedures, including this one, so I set a profiler to capture the duration and io of each statement, this showed that in my test calls to this procedure were responsible for almost all of the duration, but only 30% of the io.

After extracting some sets of typical parameters for this call I set about looking to see what the problem was.

My starting call ( a typical data set ) showed 9,673 cpu cycles, 266,000 page reads and a duration of just over 10 seconds.

The underlying tables are not vast, 4.5 and 17 million rows for the largest. There were no table scans and on the whole the queries were not too bad , these tables are quite wide and selections were generally on keys and foreign keys – I did note a couple of bookmark lookups.

One point to note here is that SSMS did not suggest any missing indexes which might improve performance.

Now table variables can be a pain, true they stop issues of recompilation, not so much an issue in SQL 2008, however just to recap, they don’t carry stats and they will not carry a parallel plan. Common myth is that table variables create in memory whilst #tables create on disk, it’s also a myth that putting a Primary Key on a table variable makes things better, it doesn’t, it will enforce uniqueness but there is only one operation on a table variable and it’s a full table scan.

My view is that you never want to join table variables to permanent tables as plans will most likely be sub optimal, I also don’t have much time for updates to table variables which use joins.

So first step ditch the table variable and put in a #table, here’s the result.

 

  cpu Page Reads Duration ( secs )

Table Variable

9,673

266,036

10.096

#Table

5,645

106,165

1.325

 

As you can see just replacing the table variable brings significant performance gains. I should point out that the results are a typical set which are representative, it’s not a worst and best.

At this point you might think well that’s fixed, what next? Aha not quite. Now freed from the table variable I had a parallel plan and a missing index suggested by SSMS. I checked out the suggestion and it was correct so I added it and went back to the plan – another index was suggested so I also added this. Results now for the two versions of the proc:

With first missing index

  cpu Page reads Duration ( secs )

Table Variable

1,747

207,163

2.267

#Table

2,541

47,311

1.293

With second missing index

  cpu Page reads Duration ( secs )

Table Variable

2,761

39,011

3.231

#Table

2,384

38,528

1.070

Checking the query plan I noted that the looping was performing a bookmark lookup, this only required an index on one column with an include on a second column, both integers. Adding this index gave me this result

  cpu Page reads Duration ( secs )

Table Variable

2,668

36,068

3.020

#Table

2,261

35,708

0.857

The initial population of the #table was generating a parallel plan, I have max degree of parallelism set to 8 on the server ( 16 cores ) so the next step was to see if if I could get any more performance by changing the level of parallelism

Maxdop Duration ( secs )

16

1.322

8

0.796

4

1.004

1

3.007

auto

1.001

Now I had the show execution plan checked during testing, it’s interesting to be able to note that there is indeed an overhead, which is clear in these results.

 

Maxdop Duration ( secs )

16

0.413

8

0.299

4

0.558

1

2.511

auto

0.455

However the answer in this case is that a definitive forced setting of 8 will bring the biggest benefit.

NB. by “auto” I mean the default setting of 8 on the server configuration.

I should point out that generally I attempt to avoid using direct hints as a tuning tool, it adds an extra level of complexity that requires constant checking for every service pack, the maxdop doesn’t map for our dev and test environments which don’t have as many cores, so in general it could lead to problems later. That said this procedure is key to a particular performance issue.

Final Note:  I don’t mean by this that all table variables should be replaced with #tables, however in a number of cases where table variables feature in say 4 table joins with proper tables, making the table variable a #table has given significant performance gains for me in application tuning. Ultimately if I can shave 10 or 20 seconds from an execution time then to me that’s a significant gain.

Published Monday, September 12, 2011 8:52 PM by GrumpyOldDBA

Comments

No Comments