In Star Trek it was pretty obvious when the Enterprise had problems; it was always something to do with the Warp-Drive anyway.
On production systems, its normally SQL Server Enterprise having problems, but when your ship is going down, what should you check? Heres a list, we've cobbled together.
- Fire up Task Manager immediately. Then come back to this after it's built up a history; when you've checked over the other usual suspects, come back to here and check out CPU Usage, if its too low, this could indicate a problem too as well as being very high. Check out the memory usage. Check out other jobs running on the server sorting by CPU and CPUTime; add this counter in.
- Are there any running SQL Server jobs hogging resource, or causing blocking, sort by status. Run msdb..sp_help_jobhistory. If there are a number of jobs, this could well be the culprit.
- Check the space in the databases, maybe run DBCC UPDATEUSAGE if none reported or you dont regularly run this, check the free space of the data files as well as the transaction logs. Is the Log too big? Has backing up of the log stopped for some reason, this does affect performance.
- Check the space on the physical drives; are any unreasonably low? run master..xp_fixeddrives
- Check the blocking? Find the leadblocker? Check sysprocesses or EM, is there anything suspicious in there, look for status like runnable and check for non-runnables too. Look at the application name. Are there any ad-hoc users connecting doing inordinate amount of work, check out cpu use and disk use.
- Run Profiler using a template or standard - duration? how long are the queries taking? Are there too many or not enough?
- Check the SQL Server configuration, Is SQL using the correct amount of memory? Check parallelism, priority boost and max worker threads. Run sp_configure, has someone been changing configuration options, do they look normal?
- Are the database settings correct? Is Auto-Shrink enabled? If so turn this off.
- Check the statistics are up-to-date, is it being re-indexed as well, check the jobs that do this, are they running now causing problems?
- If clustered server check cluadmin . Are all the instances on the same node, any problems reported? Is the live db on the secondary node?
- Check batch transactions per second in System Monitor, how busy is the SQL Server?
- Check all the Services running, are there rogue processes on the server?
- Check EventVwr; anything suspicious recently matching the time of the performance issues.
- Check SQL Server Error Logs
- Fire up System Monitor or Perfmon, check CPU, Memory, DISK and Network. Save a standared set of perfmon counters. Check against your baseline.
- Can you run a query on the server?
- Can you query the server from a client? Are the ODBC or OLE-DB connections working?
- Is the Service Pack level up-to-date, OS and SQL?
- When was the server last rebooted? in cmd, run Net statistics server. Does the server need re-booting?
- Is the server set to Named Pipes, instead of TCP/IP?
- Are there any internal audit tables holding performance stats or application functions chosen, query these.
- Has someone recently done an upgrade to OS/SQL or application?
Any more for any more? I'll add your name if you have a counter argument to any of the above or a new one to check. I guess I've only just scratched the surface.
Contributors;
- John Parker
- Richard Smith
Had a funny problem at work to do with clustering, we're not entirely sure if there was a problem with the cluster because we removed Builtin\Administrators. I was the third person brought in to resolve this problem so I'm not quite sure what fixed it. I find clustering kinda scary but also really interesting. Anyways, two links if you're interested
SQL Server 2000 Failover Clustering
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/failclus.mspx
SQL Server 2005 Failover Clustering White Paper ( for download )
http://www.microsoft.com/downloads/details.aspx?FamilyID=818234dc-a17b-4f09-b282-c6830fead499&DisplayLang=en
The problem we had was related to this...
BUG: IsAlive check does not run under the context of the BUILTIN\Administrators account in SQL Server 2000 Enterprise Edition
http://support.microsoft.com/?id=291255
If you could somehow link all the SQL Servers in the world together and then capture a Profiler trace on it, it would be extremely interesting to analyse the results. I bet table scans and therefore missing indexes would be quite high on this list. I wonder if recompilation of stored procedures called sp_ would be in the top ten culprits, eating up precious CPU cycles and causing wait time? I've been using SQL Server since 1999 as a Developer/DBA and it's easy to be cynical about developers and the way they code queries. I have lost count of the number of times I've seen developers create user stored procedures called sp_ not knowing that this forces recompilation ever time the query's run. I started thinking about why they did this, I mean, it happens a lot.
1. Microsoft call their own system stored procedures sp_, so that they recompile when they're executed. Developers see this and think it's a standard and trying to be good developers, replicate this Microsoft standard.
2. Developers like to adhere to naming conventions to make coding easier. I expect they think that sp_ means stored procedure but I wonder if it actually stands for system procedure? Surely though they should be called SSP (System Stored Procedures?)
3. They just don't know, and no-one has ever told them. It's eays to criticise other people's lack of knowledge, but I've seen this countless times. How can they all be uninformed? Maybe they don't have a DBA or SQL developer on site.
Thinking about this, I think Microsoft are slightly at fault here because of the reasons above. People who develop in SQL Server may not have the time to pick up a decent programming book and start learning the nuances of the product, they have to get a solution from SQL within the whole project they're working on and also to a deadline. Is it any surprise they don't pick up a DBA or SQL Server programming book? Perhaps if they did then I would be out of a job. Microsoft have done a very good job at creating a self-tuning RDBMS, however, I can't help but think the easiest solution to this problem is when someone creates a stored procedure called sp_, the informational message should be 'Stored Procedures called sp_ will be recompiled every time they are run and this may affect performance'. Surely this would force even the most reluctant developer to investigate this?
I'm off to the SQL Server Connect website and logging this recommendation. If Microsoft added this in a future Service Pack, I wonder if that World-Wide Trace would get any smaller?