Better Notification from SQL Agent Jobs

I manage about 10 SQL servers and I rely heavily on SQL Agent jobs to do this. My aim is to get everything going smoothly with minimal intervention from me, but I want to know straight away when something goes wrong. SQL Agent has an excellent scheduler and what you can do in a job step is flexible and powerful but it's not great at coping if the job goes wrong. Here's my simple solution.

Let's look at an example: a job to backup a number of databases, one in each step. I want it to backup as many as it can, so if there's an error I want it to keep going and then tell me at the end that the job wasn't successful.

You can't pass data directly between job steps. MSDN says you can do it via a permanent table or global temporary table - the permamnent table worked for me but not the global temporary table - but this doesn't feel right either. Depending on how you configure what a step does if it fails, you can control things to a degree:

  • If each step quits the job I don't get all my backups done.
  • If each step goes to the next step the job won't notify me there was a problem because it got successfully to the last step and that succeeded too. (I raised a request on Connect about this. If the job history viewer can show an asterisk when one of the steps failed, the notification system should be able to make this distinction too.)
  • I suppose I could use the 'Go to step X' to create a structure with an optional 'Step X-1 failed' step after each step which is skipped when there's no error. But that's just ugly and fragile.

None of these work for me. So what I do is have each step just go to the next step and the last step is a stored proc that checks the job history for this run and raises an error if anything failed. That last step quits the job reporting failure or success. Then I set the job to notify me if it fails. If any backup fails the job just moves on to the next one and, at the end, the stored proc sees the error and raises a new one, which triggers the notification email to me.

Here's the stored proc:

if exists (        select         1 
                from         dbo.sysobjects 
                where         id = object_id(N'dbo.dba_CheckJobSuccess') 
                        and objectproperty(id, N'IsProcedure') = 1    )
        drop proc dbo.dba_CheckJobSuccess
go
 
set ansi_nulls on
go
set quoted_identifier on
go
 
 
create proc [dbo].[dba_CheckJobSuccess]
(
        @JobID                uniqueidentifier 
)
as
 
/*
------------------------------------------------------------------
Version : 1.0
Date    : 07/05/2009
------------------------------------------------------------------
*/
 
set nocount on
 
 
--Check for job success
 
declare @PrevInstance        int
        , @ErrCount        int
 
--Find the end of the history for the previous run
select        top 1 @PrevInstance = sjh.instance_id
from        msdb.dbo.sysjobhistory sjh with (nolock)
where        sjh.job_id = @JobID
        and sjh.step_id = 0 --0=(Job outcome)
order by sjh.instance_id desc
 
select        @PrevInstance = isnull(@PrevInstance,0)
 
--Look for failed steps since then
select        @ErrCount = count(*)
from        msdb.dbo.sysjobhistory sjh with (nolock)
where        sjh.job_id = @JobID
        and sjh.instance_id > @PrevInstance
        and sjh.step_id <> 0 --0=(Job outcome)
        and sjh.run_status = 0
 
select @ErrCount = isnull(@ErrCount,0)
 
--Raise an error if any step(s) failed
if @ErrCount > 0
begin
        raiserror('One or more job steps failed.', 16, 1)
end
 
go

And here's how the last step calls it:

--Check for job success
exec master.dbo.dba_CheckJobSuccess @JobID = $(ESCAPE_SQUOTE(JOBID))

The value for @JobID uses token replacement to get the ID of the job so I can paste this into any job and it will just work.

Comments

No Comments