Performance tuning with subqueries and parameters

Patrick Wright has recently posted on a performance issue he was asked to look into. The issue in question involved an SP that was taking a huge amount of time, compared to what it should have been. This is one of those real life scenarios where its not a simple you need an index there are lots of factors influencing the issue.

Interestingly one of Patricks first comments is that its not how he would have done it (and looking at the code I agree). This I think is part of the problem, the reason I write code the way I do is because code written that way is more readable, provides the most consistent results, reduces complexity and the possibulity of optimiser issues. I will confess some code is written in hast just to do a job but thats a different point. 

When you are then faced with code that isn't written how you would do it what do you do, try and work with it, re-write it? The choice here has to come down to impact analysis and testing but experience suggests that if you need to resolve the issue, the re-write is probably what you need to do.

Looking at the code in Patricks situation there are a number of things to point out,

  1. a derived table with no effective filter clause (I suspect most records have values for orderno so the NOT NULL check and check involving a function have little filtering effect)
  2. This is joined to a "Random Sample" of records that has no defined order (of use) filtered by the date parameters.
  3. There are 4 subqueries, 3 in the SELECT clause and 1 in the where clause.
  4. There's a function in the SELECT clause

The optimizer really has not much chance here.

  1. Sub queries are fine with small result sets, but as result sets gets larger they become a perf hit (unless the optimiser can resolve it to a join)
  2. Functions in a SELECT clause can result in a set based query turning into a row by row query.
  3. The use of date parameters (not altered in the SP) will mean the query will be optimised for the values passed in the first time the query is executed.
  4. The sub query in the WHERE clause needs to be evaluated for the result set. There is no guarantee this will be done on the final result set. It may be combined with the first derived table.
  5. The optimiser can't effeciently join to the "random sample"

The bottom line is that the optimiser may be buildin an query plan that results in a performant run for one set of values, but when run for another set of values the query plan is sub optimal. The use of internal date variables forces the optimiser to use generic values for the date parameters and thus probably finding a middle groundas Patrick found. The SP was quicker but not as quick as if he had done the query in QA.

This is a common problem because the Query is trying to answer all questions in one go which may not be the best solution, and will result in very different results depending on the box and IO susbsystem that it is run on.

The areas that I would address (re-write) are,

  1. Combine the first derived table with the WHERE clause with a JOIN or a WHERE EXISTS (depending on uniqueness of Documentid on DocumentCheck)
  2. Combine the Random and first derived tables together or extract the random table to insert into a temporary table (has statistics rather than a table variable than doesn't)
  3. Consider whether the function can be replaced
  4. Resolve the sub queries into joins

Or do as Patricks friend suggested, rewrite as dynamic SQL.

My final comment is that the SP also suggests that a user/the business has requested something and a solution has been found that does exactly what has been asked for, without looking at how the DB is structured and coming up with something that provides the business with what they want but also performs well with the business, i.e. in this case can the top 1000 be replaced by top 100 orders. I can understand why many of the decisions have been made in this query, I just question whether they should have been made that way.

 

-
Published Friday, January 27, 2006 6:00 AM by simonsabin
Filed under:

Comments

No Comments