January 2006 - Posts
If you've read my previous post on performance issues with a stored procedure I thought it would be good to revisit the SP with a view of what SQL Server 2005 can do to assist. There are quite a few
- Parameter values based on values at first runtime
- SQL Server 2005 allows you to specify values which the optimiser should use to produce its plan
- If the SP was split into more statements, individual statements could be recompiled.
- Use of a function in the SELECT clause
- CLR functions will likely perform better that the equivalent TSQL version when used in a SELECT statement.
- Use of function is often to get round a deficiency in TSQL, a Common Table Expression may be able to be used instead
- Dynamic SQL
- If the Dynamic SQL option was chosen then in SQL 2000 you had to give the calling user access to the tables, in SQL 2005 you can use the EXECUTE AS clause to avoid granting these permissions
Patrick Wright has recently posted on a performance issue he was asked to look into. The issue in question involved an SP that was taking a huge amount of time, compared to what it should have been. This is one of those real life scenarios where its not a simple you need an index there are lots of factors influencing the issue.
Interestingly one of Patricks first comments is that its not how he would have done it (and looking at the code I agree). This I think is part of the problem, the reason I write code the way I do is because code written that way is more readable, provides the most consistent results, reduces complexity and the possibulity of optimiser issues. I will confess some code is written in hast just to do a job but thats a different point.
When you are then faced with code that isn't written how you would do it what do you do, try and work with it, re-write it? The choice here has to come down to impact analysis and testing but experience suggests that if you need to resolve the issue, the re-write is probably what you need to do.
Looking at the code in Patricks situation there are a number of things to point out,
- a derived table with no effective filter clause (I suspect most records have values for orderno so the NOT NULL check and check involving a function have little filtering effect)
- This is joined to a "Random Sample" of records that has no defined order (of use) filtered by the date parameters.
- There are 4 subqueries, 3 in the SELECT clause and 1 in the where clause.
- There's a function in the SELECT clause
The optimizer really has not much chance here.
- Sub queries are fine with small result sets, but as result sets gets larger they become a perf hit (unless the optimiser can resolve it to a join)
- Functions in a SELECT clause can result in a set based query turning into a row by row query.
- The use of date parameters (not altered in the SP) will mean the query will be optimised for the values passed in the first time the query is executed.
- The sub query in the WHERE clause needs to be evaluated for the result set. There is no guarantee this will be done on the final result set. It may be combined with the first derived table.
- The optimiser can't effeciently join to the "random sample"
The bottom line is that the optimiser may be buildin an query plan that results in a performant run for one set of values, but when run for another set of values the query plan is sub optimal. The use of internal date variables forces the optimiser to use generic values for the date parameters and thus probably finding a middle groundas Patrick found. The SP was quicker but not as quick as if he had done the query in QA.
This is a common problem because the Query is trying to answer all questions in one go which may not be the best solution, and will result in very different results depending on the box and IO susbsystem that it is run on.
The areas that I would address (re-write) are,
- Combine the first derived table with the WHERE clause with a JOIN or a WHERE EXISTS (depending on uniqueness of Documentid on DocumentCheck)
- Combine the Random and first derived tables together or extract the random table to insert into a temporary table (has statistics rather than a table variable than doesn't)
- Consider whether the function can be replaced
- Resolve the sub queries into joins
Or do as Patricks friend suggested, rewrite as dynamic SQL.
My final comment is that the SP also suggests that a user/the business has requested something and a solution has been found that does exactly what has been asked for, without looking at how the DB is structured and coming up with something that provides the business with what they want but also performs well with the business, i.e. in this case can the top 1000 be replaced by top 100 orders. I can understand why many of the decisions have been made in this query, I just question whether they should have been made that way.
How many of you have the same database in Dev, Test and Live but maybe slightly different column orders in your tables. This isn't an issue is it because we always use column names in SELECT, INSERT statements, don't we!!! No one uses SELECT * do they.
Well SSIS does, sort of, if you select the table option in the Lookup
What it does do is get the meta data for you table and stores it. Now if you've worked with SSIS or been to any SSIS talk you should know that SSIS at the runtime level deals buffers and the data in these is manipulated by the ordinal position of a field not its name. Well it seems that the Lookup does the same. It builds its internal structures based on the ordinal positions of the columns, at validation time it then validates that the column names (and data types I think) of columns it is using are the same as they were at runtime. I think it also validates that the table has the same number of columns.
This of course isn't an issue until some one adds a column in Dev that hasn't been deployed to Live. You build your package on dev, try and deploy and it fails initially because the columns don't exist in test (even if they are not being used).
So the answer is, use a SQL query and only specify the columns you want. That way you can guarentee that if those columns exist in your table(s) you will be fine, there will be no problems if anyone changes that table.
Personally I think this is a bug, but I think its best practice any way.
(ps sorry Jamie if you've already posted this :)
This hasn't been greatly advertised but the SQL and VS 2005 Certification details can be found here.
I guess the reason it hasn't been greatly advertised is that its not complete yet. I did mention last year the free training available, well it still is so get it while you can.
On Thursday I went along to the London .Net User Group at the Microsoft in Soho aka The Swimming Pool.
Ian Griffiths was presenting on C# 3.0 Enhancements and LINQ.
I was fortunate enough to go to the PDC where the only sessions that I went to around this were the Linq and DLinq seminars. It all now makes much more sense, Ian was great in going through each of the new features in a seemless manner and amazingly bringing it all together with how the new features enable LINQ
Just like a good book Ians presentation had a good start, middle and end.
Ians Blog can be found here http://www.interact-sw.co.uk/iangblog/ and the london .Net UserGroup can be found here http://www.dnug.org.uk/
I hadn't tried writing a report based on an SSIS package on the RTM build until tonight. I couldn't understand why the SSIS options where not in the datasource builder. After a bit on investigation I found the following article in BOL "Configuring Reporting Services to Use Integration Services Package Data"
It seems that a couple of the providers for Reports, namely SSIS and SAP, aren't enabled by default. Its fairly simple to enable them by just uncommenting out the lines in the config files.
I am doing the UK SQL Server Usergroup launch event in Feb (Register here, http://sqlserverfaq.com/?eid=65 )and want to make sure I cover what you want.
Having done the official launch events and tried to cram everything into 45 minutes its clear that there is a lot of stuff in SQL Server 2005. I am looking for feedback on what you would like me to cover. please use the contact me on the left to let me know.
We will be giving away, full products of SQL Server and Visual Studio, Exam vouchers, Recource DVDs, Hands on lab DVDs for SQL Server, Visual Studio, Biztalk and Team System. This means you can try out the features of these products without the hassle of installing them, this is especially true for the high availability DVD which has 3 if not 4 sql server instances on it.
Plus there will be free pizza
Following on from my previous post on CVs I thought I would see if I could get as much response from another post on CVs.
Make your CV standout, when reading CVs. I am looking for ones that standout, these are generally those that, rather than just stating what the job role is, goes into detail about the difference the person made. For example, a previous experience that looks something like this doesn't do anything but tell me that you can copy topics from BOL,
May 2001 - Current Some Big Company
Senior DBA responsible for 100 database servers. Tasks include
- Backup and restores
- DR plan
- Index maintenance
- Maintenance Plans
- SQL Agent Jobs
- Batch jobs in TSQL using cursors
- Performance montioring using profiler and perfmon
Whereas the following indicates a bit more aptitude and the difference is that it shows the result of your actions and one can infer the above
May 2001 - Current Some Big Company
Senior DBA responsible for 100 database servers with an availability of 99.999% for 24x7. Work undertaken include
- Implemented monitoring solution using NetIQ
- Automated operational checks reducing proactive checking to 5 mins per hour from a full time job
- Improved failover time for DR from 2 hrs to 10minutes.
- Proactively monitored servers reducing incidents from 20/month before I joined to 5/month.
- Reduced a number of batch processes by 300%
Bottom line is that I'm looking for makes me want to employ you and not the other person thats done exactly the same thing.
Its all about getting to the next stage when you can explain yourself more and prove yourself.
Have you spent your hard earned money on tools like Redgate's SQL Compare to enable you to compare data. Or just dispaired when your publihser and subscriber get out of sync.
Well in SQL Server 2005 there is a new utility to compare the data in two tables, the data can be in different tables and on different servers.
The utility will allow for the results of the comparison to be stored in a table.
Ok so it doesn't provide the details of whats different but whether a record is mismatched, only on the source or only on the destination.
So whats the utility, its Tablediff
Its located in the Program Files\Microsoft SQL Server\90\COM folder
I'm recruiting again at the moment and havign to sift through loads of CVs 2 things that amuses/annoys me are
1. Putting MSDOS and Office as skills. Why?
You're applying for a job in IT, if you feel you need to enhance you list of skills with outdated and general skills then one has to start wondering about your calibre. If you can't use word you should be, where have you been for the past decade, MARS!
2. Putting too much history.
I don't really care that you wrote a program on a ZX spectrum to repeat Hello World all over the screen in the 80s.
What I want to know is what have you done that is going to make you a good fit for the role I'm recruiting.
To this end I am all for tailoring CVs for the job, We've all done lots of stuff thats really, cool, clever, saved lots of money etc. But much of the time, only a small proportion of those things are relevant for the job your applying for.
Keep it focused.