Not thinking it through

I’m looking into performance problems on a box which is in a managed data centre, so there is the question of ownership of problems etc. One current issue has been an index rebuild job which has been regularly failing. The job is a simple cursor loop that applies a dbcc dbreindex to each table in turn, the database is in full recovery mode and tran logs get backed up hourly, it’s effectively a 7 x 24 system so options of changing database mode are not applicable. The server is also, in my view, short of disk space, we can only store one backup of the main database which is about 20gb in size, and the space available for transaction logs is also limited.

There were a number of server configuration changes to the out of the box settings, one which still exists is a setting to the Query Governor. Upon investigation it was discovered that the query governor setting was responsible for the failure of the dbcc dbreindex job.

Once this was resolved the reindex of the tables ran the transaction log drive out of space! I’m not going to relate how that was resolved, let’s just say the production database now has a different dbid.

The solution to this issue was to add a second loop inside the dbreindex loop to perform a transaction log backup every 10 or 20 tables, sounds fair, however, the log backup command is with an init and the time stamp for the file name is of the usual format  dbname_log_yyymmddhhmm.bak, problem is if the dbreindex  hits a batch of small tables the loop executes in less than a minute, this means the next log file overwrites the previous file and we have lost the continuity of the transaction logs and we can’t recover the database.

This was “tested” a number of times on the production server, including generating the logs without actually applying the reindex, 20 transaction log backups all with the same name in under a minute. As there were no before and after backups of the database, not enough disk space, the routine destroyed the recovery point until the next full backup – which didn’t occur until 36 hours later, putting us at risk of loss of a day’s business.

The Query Governor has been increased enough to presumably allow the job to run, although as we haven't had a complete run of the job data changes since may of course cause the job to fail next time it runs.

So the moral of this story is to beware changing server settings, you might not realise what is affected and think through clearly the logic of coding changes.

Published 30 November 2006 10:11 by GrumpyOldDBA
Filed under:

Comments

# re: Not thinking it through

30 November 2006 11:20 by DamianMulvena

One suggestion for the "10-20 tables" problem might be to estimate the index sizes.  You could count either table rows or index pages, and use a bigger batch size for the smaller tables.

# re: Not thinking it through

30 November 2006 18:40 by GrumpyOldDBA

I have a couple of routines, again nothing very original but I'll perhaps post them. Apparently there used to be a seperate job for each letter of the alphabet with a suitable timing.. ah well! In reality re-indexing of almost always available databases is quite tricky .. I tried index defrag but that had other problems.

# Not Thinking it through - the saga continues

04 January 2007 16:49 by Grumpy Old DBA

The saga continues ……….. Here below are the previous posts which relate to the management of a production