May 2009 - Posts

Monitoring SQL Agent Jobs

In my last post I showed how to reliably get your SQL Agent jobs to only bother you if they fail. But how can you be confident that they are in fact running? Well, what I did is I set up a report in Reporting Services that reads the job history tables on a list of servers and shows me what jobs have run lately and whether they succeeded. I have the report sent to me every morning. Successful jobs are green and failed jobs are red. I get a clear consolidated picture of what's been going on.

Here's the query the report uses:

if exists (    select     1

        from     dbo.sysobjects

        where     id = object_id(N'dbo.rpt_ITSQLAgentJobs')

            and objectproperty(id, N'IsProcedure') = 1    )

    drop procedure dbo.rpt_ITSQLAgentJobs

go

 

set quoted_identifier on

go

set ansi_nulls on

go

 

 

create proc dbo.rpt_ITSQLAgentJobs

(

    @From    datetime

    , @To    datetime

)

as

 

/*

----------------------------------------------------------------------

Version:    1.0

Date:        13/05/2009

----------------------------------------------------------------------

*/

 

set nocount on

 

declare @FromInt int

    , @ToInt int

 

-- In sysjobhistory Run Date is an integer in the form YYYYMMDD.

set @FromInt = cast((convert(varchar(8),@From,112)) as int)

set @ToInt = cast((convert(varchar(8),@To,112)) as int)

 

-- I tried a union but the servers aren't all on the same collation so that doesn't work.

-- The temp table approach gets round that.

 

 

----------------------------------------------------------------------

-- server1

----------------------------------------------------------------------

 

-- Create the temp table from the first server's results.

select    sjh.run_date

    , sjh.run_time

    , sjh.server

    , sj.name as job_name

    , sjh.step_name

    , sjh.message

    , sjh.run_status

into    #t

from    server1.msdb.dbo.sysjobhistory sjh with (nolock)

    inner join server1.msdb.dbo.sysjobs_view sj with (nolock) on sj.job_id = sjh.job_id

where    sjh.step_id = 0 --(Job outcome)

    and sjh.run_date >= @FromInt

    and sjh.run_date <= @ToInt

 

 

----------------------------------------------------------------------

-- server2

----------------------------------------------------------------------

 

begin try

    insert    #t

    select    sjh.run_date

        , sjh.run_time

        , sjh.server collate Latin1_General_CI_AS

        , sj.name as job_name

        , sjh.step_name

        , sjh.message

        , sjh.run_status

    from    server2.dbo.sysjobhistory sjh with (nolock)

        inner join server2.msdb.dbo.sysjobs_view sj with (nolock) on sj.job_id = sjh.job_id

    where    sjh.step_id = 0 --(Job outcome)

        and sjh.run_date >= @FromInt

        and sjh.run_date <= @ToInt

end try

begin catch

    insert    #t

        (run_date

        , run_time

        , [server]

        , job_name

        , step_name

        , [message]

        , run_status)

    select    99999999 --sjh.run_date

        , 0 --sjh.run_time

        , 'server2' --sjh.server collate Latin1_General_CI_AS

        , '' --sj.name as job_name

        , '' --sjh.step_name

        , error_message() --sjh.message

        , 0 -- sjh.run_status

end catch

 

-- Repeat the server2 bit for every other server you want to monitor

 

 

--Return results

select    *

    , case run_status

        when 0 then 'Failed'

        when 1 then 'Succeeded'

        when 2 then 'Retry'

        when 3 then 'Canceled'

        when 4 then 'In progress'

        else 'Unknown (' + cast(run_status as varchar(3)) + ')'

      end as status_desc

from    #t

order by run_date desc

    , run_time desc

    , [server]

 

go

Posted by DavidWimbush | with no comments

Better Notification from SQL Agent Jobs

I manage about 10 SQL servers and I rely heavily on SQL Agent jobs to do this. My aim is to get everything going smoothly with minimal intervention from me, but I want to know straight away when something goes wrong. SQL Agent has an excellent scheduler and what you can do in a job step is flexible and powerful but it's not great at coping if the job goes wrong. Here's my simple solution.

Let's look at an example: a job to backup a number of databases, one in each step. I want it to backup as many as it can, so if there's an error I want it to keep going and then tell me at the end that the job wasn't successful.

You can't pass data directly between job steps. MSDN says you can do it via a permanent table or global temporary table - the permamnent table worked for me but not the global temporary table - but this doesn't feel right either. Depending on how you configure what a step does if it fails, you can control things to a degree:

  • If each step quits the job I don't get all my backups done.
  • If each step goes to the next step the job won't notify me there was a problem because it got successfully to the last step and that succeeded too. (I raised a request on Connect about this. If the job history viewer can show an asterisk when one of the steps failed, the notification system should be able to make this distinction too.)
  • I suppose I could use the 'Go to step X' to create a structure with an optional 'Step X-1 failed' step after each step which is skipped when there's no error. But that's just ugly and fragile.

None of these work for me. So what I do is have each step just go to the next step and the last step is a stored proc that checks the job history for this run and raises an error if anything failed. That last step quits the job reporting failure or success. Then I set the job to notify me if it fails. If any backup fails the job just moves on to the next one and, at the end, the stored proc sees the error and raises a new one, which triggers the notification email to me.

Here's the stored proc:

if exists (        select         1 
                from         dbo.sysobjects 
                where         id = object_id(N'dbo.dba_CheckJobSuccess') 
                        and objectproperty(id, N'IsProcedure') = 1    )
        drop proc dbo.dba_CheckJobSuccess
go
 
set ansi_nulls on
go
set quoted_identifier on
go
 
 
create proc [dbo].[dba_CheckJobSuccess]
(
        @JobID                uniqueidentifier 
)
as
 
/*
------------------------------------------------------------------
Version : 1.0
Date    : 07/05/2009
------------------------------------------------------------------
*/
 
set nocount on
 
 
--Check for job success
 
declare @PrevInstance        int
        , @ErrCount        int
 
--Find the end of the history for the previous run
select        top 1 @PrevInstance = sjh.instance_id
from        msdb.dbo.sysjobhistory sjh with (nolock)
where        sjh.job_id = @JobID
        and sjh.step_id = 0 --0=(Job outcome)
order by sjh.instance_id desc
 
select        @PrevInstance = isnull(@PrevInstance,0)
 
--Look for failed steps since then
select        @ErrCount = count(*)
from        msdb.dbo.sysjobhistory sjh with (nolock)
where        sjh.job_id = @JobID
        and sjh.instance_id > @PrevInstance
        and sjh.step_id <> 0 --0=(Job outcome)
        and sjh.run_status = 0
 
select @ErrCount = isnull(@ErrCount,0)
 
--Raise an error if any step(s) failed
if @ErrCount > 0
begin
        raiserror('One or more job steps failed.', 16, 1)
end
 
go

And here's how the last step calls it:

--Check for job success
exec master.dbo.dba_CheckJobSuccess @JobID = $(ESCAPE_SQUOTE(JOBID))

The value for @JobID uses token replacement to get the ID of the job so I can paste this into any job and it will just work.

Posted by DavidWimbush | with no comments