06 April 2007 12:09
tonyrogerson
EVENT NOTIFICATIONS: Monitoring blocked processes (and other events) - end to end how to set it up and make it work
Prior to SQL Server 2005 we had to do a lot of monitoring to see what processes are blocking and what they are executing, this was always a pain. In SQL Server 2005 they introduced a technology called ‘Event Notifications’, numerous events fire within the SQL Server engine – these are all traceable or can be captured and plonked on a Service Broker queue. Don’t be put off by this new technology, once you work out what to do and get the plumbing sorted its amazingly easy.
The BLOCKED_PROCESS_REPORT event can be enabled (sp_configure), this event fires when a process is blocked for more than a configurable amount of time. Once fired the event data contains the input buffers of the waiter and the blocker – note, the chain can contain lots of spid’s, the event only ever contains just two connections, for example if the head of the blocker chain is spid 90 and you have 3 blocked connections, spid 95 is waiting on spid 93 and spid 93 is waiting on spid 90 then you will get 2 reports – one containing spids 95 and 93 and one containing spids 93 and 90, like all everything else if we do the work we can get the complete chain.
You should also note, that the <inputbuf></> contains the current input buffer and not the original statement that took the locks and is creating the block.
Let’s get it going.
Step 1: Configure the server to get this feature working.
The sp_configure option ‘Show Advanced Options’ needs to be enabled, if it is not already then run this.
sp_configure 'show advanced options', 1
go
reconfigure
go
sp_configure
go
Now we need to set our blocked process report capture threshold, this is the number of seconds the block is in effect after which you get an event firing; for example, setting this to 10 tells SQL Server to fire an event whenever a block is 10 seconds old or 20 seconds or 30 seconds etc... if the block lasted 60 seconds then you’d get 6 events.
-- Now, set the blocked process threshold to 10 seconds
sp_configure 'blocked process threshold', 10
go
reconfigure
go
That’s the server ready, now we can create a database to hold our service broker queue, this can be any user database, I prefer to create my own management database so that it is separate from the application databases.
CREATE DATABASE DBAEventManagement
go
USE DBAEventManagement
go
We now need to enable Service Broker in this database
ALTER DATABASE DBAEventManagement SET ENABLE_BROKER
go
Ok, what is Service Broker? In a nutshell it’s a messaging technology, you have a Service Broker Queue which holds your messages and you have a Service Broker Service which manages the transmission to the queue. The queue is simply rows in a database table with a XML column message_body holding the data you’ve sent. The technology offers significantly more than that but that is way outside this blog entry.
We start by creating our Service Broker queue.
-- Create a service broker queue to hold the events
CREATE QUEUE syseventqueue
go
We now need a service that can accept incoming messages and store them on our queue. Service Broker has the concept of contracts; a contract is an agreement between the two services communicating with each other it states that only certain message types will be used to communicate, a message type is the format your data is in, for example your data is XML and a specific schema.
-- Create a service broker service receive the events
CREATE SERVICE syseventservice
ON QUEUE syseventqueue ( [http://schemas.microsoft.com/SQL/Notifications/PostEventNotification] )
go
As you can see we have used a specific contract that states that you should use PostEventNotification message types against this service.
We now have our working Service Broker queue, we can add our notification.
CREATE EVENT NOTIFICATION notify_locks
ON SERVER
WITH fan_in
FOR blocked_process_report
TO SERVICE 'syseventservice', 'current database';
go
Note the space in ‘current database’, I just spent 3 hours wondering why nothing was happening, no errors – nothing and it’s because I’d wrote current_database – a very easy typo to make when you are so used to object names!
The statement creates a notification called notify_locks (should be blocks but heyho), it is ‘server’ scoped that means it captures any BLOCKED_PROCESS_REPORT event firing in this instance of SQL Server, we can specify a specific DATABASE instead if you want to narrow it down and have different processing per database, perhaps where you have a production and qa/dev environment on the same SQL instance.
We can now do a test, first let’s set up a test table...
CREATE TABLE test ( mycol int )
INSERT test ( mycol ) VALUES( 1 )
go
Now, in connection A run this...
-- Connection A
BEGIN TRAN
UPDATE test SET mycol = 5
Now, in connection B run this...
-- Connection B
SELECT * FROM test
The block needs to be in place for at least 10 seconds or whatever you’ve set your threshold at.
We can check the broker queue by ‘browsing’ it thus:
SELECT cast( message_body as xml ), *
FROM syseventqueue
Click on one of the entries and you’ll see XML like this...
<EVENT_INSTANCE>
<EventType>BLOCKED_PROCESS_REPORT</EventType>
<PostTime>2007-04-06T10:47:19.417</PostTime>
<SPID>4</SPID>
<TextData>
<blocked-process-report monitorLoop="12813">
<blocked-process>
<process id="processf88988" taskpriority="0" logused="0" waitresource="RID: 28:1:73:0" waittime="13734" ownerId="1740293" transactionname="SELECT" lasttranstarted="2007-04-06T10:47:05.680" XDES="0xffffffff802ac9a8" lockMode="S" schedulerid="2" kpid="3508" status="suspended" spid="60" sbid="0" ecid="0" priority="0" transcount="0" lastbatchstarted="2007-04-06T10:47:05.680" lastbatchcompleted="2007-04-06T10:47:05.603" clientapp="Microsoft SQL Server Management Studio - Query" hostname="TONYWS" hostpid="2668" loginname="TORVER\tonyrogerson" isolationlevel="read committed (2)" xactid="1740293" currentdb="28" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
<executionStack>
<frame line="2" stmtstart="36" sqlhandle="0x020000001ed3611e4589a687c58341acf7c4be9daeda7044" />
</executionStack>
<inputbuf>
-- Connection b
select * from DBAEventManagement..test
</inputbuf>
</process>
</blocked-process>
<blocking-process>
<process status="sleeping" spid="59" sbid="0" ecid="0" priority="0" transcount="1" lastbatchstarted="2007-04-06T10:47:16.837" lastbatchcompleted="2007-04-06T10:47:16.837" lastattention="2007-04-06T10:42:35.973" clientapp="Microsoft SQL Server Management Studio - Query" hostname="TONYWS" hostpid="2668" loginname="TORVER\tonyrogerson" isolationlevel="read committed (2)" xactid="1739624" currentdb="28" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
<executionStack />
<inputbuf>
SELECT *
FROM syseventqueue
</inputbuf>
</process>
</blocking-process>
</blocked-process-report>
</TextData>
<DatabaseID>28</DatabaseID>
<TransactionID>1740293</TransactionID>
<Duration>13734000</Duration>
<EndTime>2007-04-06T10:47:19.413</EndTime>
<ObjectID>0</ObjectID>
<IndexID>0</IndexID>
<ServerName>TORVERM1</ServerName>
<Mode>3</Mode>
<LoginSid>AQ==</LoginSid>
<EventSequence>145198</EventSequence>
<IsSystem>1</IsSystem>
<SessionLoginName />
</EVENT_INSTANCE>
Look at the second <inputbuf> it’s not our UPDATE! That is because it takes last input buffer for the connection that is blocking and because I am doing the monitoring and UPDATE in the same connection I get the SELECT * FROM syseventqueue statement.
The <process> node is probably the most important here, it shows you what program and host is causing the problem. Because you have the xactid you can track that down through SQL Profiler if you’d have been tracing SQLTransaction as well (another blog post on that I guess – the list get’s longer...).
Anyway, back on track – we have just browsed the queue, we need to actually get our messages off the queue otherwise the queue is going to grow and grow, enter the statement RECEIVE.
DECLARE @msgs TABLE ( message_body xml not null,
message_sequence_number int not null );
RECEIVE message_body, message_sequence_number
FROM syseventqueue
INTO @msgs;
SELECT message_body,
DatabaseId = cast( message_body as xml ).value( '(/EVENT_INSTANCE/DatabaseID)[1]', 'int' ),
Process = cast( message_body as xml ).query( '/EVENT_INSTANCE/TextData/blocked-process-report/blocked-process/process' )
FROM @msgs
ORDER BY message_sequence_number
The above example creates a table variable to hold the data pulled from the queue, the RECEIVE literally receives all the messages off the queue – you don’t have to do it that way, there are lots of options (see BOL). The SELECT is simply querying the XML, in the first instance we are getting the integer value of the DatabaseId element, to do that we use the XML .Value method; to get the Process column we simply use the XML .Query method and get the XML for the process information.
From this you now have access to the data from the event and you can do anything, I tend to use Database Mail (sp_send_dbmail) to email me when there is a block, but I do put logic in there to buffer the emails sent – I don’t want an email every 10 seconds for the same block!
One last thing to note and it’s really important; when the event notification fires, if there is an error placing your event on the service for instance if you’d set a remote service up in say a centrally managed environment then the notification literally gets dropped (like DROP EVENT NOTIFICATION being issued), you do get an information message in the SQL Server errorlog though). So, make sure you build something in that checks that the notifications are still there, for server scoped events...
SELECT * FROM sys.server_event_notifications
Well, that’s it folks – enjoy; I hope you find this useful – it’s in the product and greatly under used; there is not just the BLOCKED_PROCESS_REPORT event to capture there are a ton of others.
Filed under: SQL Server