Building complex data sets

If you've got a complex data set to produce how should you do it. Adam Machanic has posted about an Anti Pattern (bad practice) for such a situation. http://sqlblog.com/blogs/adam_machanic/archive/2007/12/15/anti-patterns-and-malpractices-volume-1-tumbling-data.aspx

He talks baout the approach of building temp tables and then using the them to produce more temp tables and finally the result. I agree in many situations this is a bad practice however as with database normalisation there is a trade off which you need to decide upon.

I personally find that building individual statements can be good for a few reasons,

  • Supporting the code.
    Derived tables, pivots, case statements and sub queries are advanced SQL and so using lots of them means the person supporting the code has to be able to understand it.

  • Debuging code
    If you have an issue with the output having the code in one bunch generally means you have to disect the query to be able to debug it. Having it already broken down eases this issue and allows for assertions to be made in the code to ensure that the data is behaving as expected.

  • Performance
    Putting everything in one query will greatly increase the options for the optimiser. I have seen this result in very bad plans, especially with parallism involved.

  • Manageability
    Doing things in one means you have a single black box, it starts and finishes. Its very difficult to see what going on. Breaking it up into smaller chunks provides better options for seeing and managing progress.

  • Memory usage
    Doing it all in one go can result in you needing a large amount memory for the whole query. Splitting it up reduces the single memory requirements.

Now I have to say that these comments are generally when building large complex data sets for reports or exporting data. In addition there is also the ETL scenario. In that case you have a decision on the granularity of your processing whether you want row by row or set based. The last generally offers best performance but less control over errors.

So whilst I agree with Adam there is an element of it depends on the situation.



-
Published 01 April 2008 23:29 by simonsabin

Comments

No Comments