Scheduled Jobs on strike ??

I had an interesting situation with agent jobs stopping running without failure.
I don't want to go into attempting to describe our infrastructure but siffice it say we had a few issues.
None of my SQL Servers went down but there were problems and when the system picked up I had some interesting situations on a number of SQL 2008 and SQL 2005 Servers.
Essentially jobs which ran at short time intervals were just not running whilst jobs with time intervals of over an hour were all running fine.
As you might guess t-log backups tend to run at short time intervals but i also have a number of monitoring jobs which collect data every 5 minutes, these store data locally on the servers involved.
If we consider a t-log backup job as an example:-

  • This had stopped some hours previously, but without an error.
  • Other jobs were still running however.
  • The Server had not failed over
  • The Server was still running.
  • Users were happily back in the application and accessing the server.
  • I have a query, to be further examined, which will list a scheduled job which has not run when it should, this listed nothing!
  • The jobs were not disabled
  • The schedules were not disabled
  • I ran the job manually, fine, but the schedule did not pick up.
  • In the end I took the agent service offline and then brought it back online.
  • In every case, cluster or standalone, this resolved the problem.

I'm at a bit of a loss to fully understand why this happened this way, as I say it seemed that those jobs scheduled in minutes all stopped.
Most other nightly jobs ran, but not all of them and not to any particular pattern that I can see.
It is possible that maybe the time server put out some supect times, a few other servers had some strange times for a while I believe.

The only observation I have is that it is wise to check everything when there is a problem and not just assume because the Agent Service is running without errors that all is well!

Published 17 May 2011 16:59 by GrumpyOldDBA
Filed under:

Comments

# re: Scheduled Jobs on strike ??

25 May 2011 20:21 by morticia

Hi Mr Grumpy,

Is this occurring on VMWare or similar virtual servers by any chance ?

I have experienced exactly those symptoms and the root cause was a VMWare 'feature' which caused the system time to drift in and out during scheduled VMWare snapshots.

This caused the SQL Agent schedule to 'never arrive'.

From the Grumpy Old Problem Manager

# re: Scheduled Jobs on strike ??

27 May 2011 14:18 by GrumpyOldDBA

sadly not all otherwise it would have been an easy strike! It's quite possible that the time server or times in general were something akin to back to the future so I'd figure that's a pretty a good call, thanks.