December 2010 - Posts

SQL 2005 Transaction Rollback Hung–unresolved deadlock

Encountered an interesting issue recently with a SQL 2005 sp3 Enterprise Edition system.

Every weekend, a full database reindex was being run on this system – normally this took around one and a half hours.


Then, one weekend, the job ran for over 17 hours  - and had yet to complete...

At this point, DBA cancelled the job. Job status is now cancelled – issue over…

 

However, cancelling the job had not killed the reindex transaction – DBCC OPENTRAN was still showing the transaction being open.

The oldest open transaction in the database was now over 17 hours old.  Consequently, transaction log % used growing dramatically and locks still being held in the database...

Further attempts to kill the transaction did nothing. ie we had a transaction which could not be killed.

In sysprocesses, it was apparent the SPID was in rollback status, but the spid was not accumulating CPU or IO. Was the SPID stuck ?

On examination of the SQL errorlog – shortly after the reindex had started, a whole bunch of deadlock output had been produced by trace flag 1222. Then this :-

spid5s      ***Stack Dump being sent to   xxxxxxx\SQLDump0042.txt
spid5s      * *******************************************************************************
spid5s      *
spid5s      * BEGIN STACK DUMP:
spid5s      *   12/05/10 01:04:47 spid 5
spid5s      *
spid5s      * Unresolved deadlock
spid5s      *
spid5s      *  
spid5s      * *******************************************************************************
spid5s      * -------------------------------------------------------------------------------
spid5s      * Short Stack Dump
spid5s      Stack Signature for the dump is 0x000001D7
spid5s      External dump process return code 0x20000001.

Unresolved deadlock – don’t think I’ve ever seen one of these before….

A quick call to Microsoft support confirmed the following bug had been hit :- http://support.microsoft.com/kb/961479

So, only option to get rid of the hung spid – to restart SQL Server…

Fortunately SQL Server restarted without any issues. I was pleasantly surprised to see that recovery on this particular database was fast. However, restarting SQL Server to fix an issue is not something I would normally rush to do...

Short term fix – the reindex was changed to use MAXDOP of 1. Longer term fix will be to apply the correct CU, or wait for SQL 2005 sp 4 ?? This should be released any day soon I hope..

Posted by steveh99999 | with no comments

Outlying DBAs

Read an interesting book recently, ‘Outliers – the story of success’ by Malcolm Gladwell.

image

There’s a good synopsis of the book here on wikipedia.

I don’t want to write in detailed review of the book, but it’s well worth a read. There were a couple of sections which I thought were possibly relevant to IT professionals and DBAs in particular.

Firstly, ‘the 10,000 hour rule’, in this section Gladwell asserts that to be a real ‘elite performer’ takes 10,000 hours of practice.

‘Practice isn’t the thing you do once you’re good, it’s the thing you do that makes you good’.  He gives many interesting examples – the Beatles, Bill Gates etc – but I was wondering could this be applied to DBAs ?

If it takes 10,000 hours to be a really elite DBA – how long does that really take ? 8 hours a day makes 1250 days. If we assume that most DBAs work around 230 days a year – then it takes around 5 and a half years to become an elite DBA.  

But how much time per day does a DBA spend actually doing DBA work ? Certainly it’s my experience that the more experienced I get as a DBA, the less time I seem to spend actually doing DBA work – ie meetings, change-control meetings, project planning, liasing with other teams, appraisals etc.  Is it more accurate to assume that a DBA spends half their time actually doing ‘real’ DBA work – or is that just my bad luck ?  

So, in reality, I’d argue it can take at least 5 1/2 and more likely closer to 10 years to become an elite DBA. Why do I keep receiving CVs for senior DBAs with 2-4 years actual DBA experience ?

In the second section I found particularly interesting, Gladwell writes about analysis of plane crashes and the importance of in-cockpit communications. He describes a couple of crashes involving Korean Airlines – where co-pilots were often deferrential to pilots, and unwilling to openly criticise their more senior colleagues or point out errors when things were going badly wrong…

image

There’s a better summary of Gladwell’s concepts on mitigation  here – but to apply this to a DBA role… If you are a DBA and you do not agree with  a decision of one of your superiors, then it’s your duty as a DBA to say what you think is wrong, before it’s too late… 

Obviously there’s a fine line between constructive criticism and moaning, but a good senior DBA or manager should be able to take well-researched criticism\debate from a more junior DBA.   Is this really possible ?

Posted by steveh99999 | with no comments