Creating a baseline for P & O part 3

The most important part of performance tuning is to be able to establish a baseline. Without a baseline and on-going statistics and trending information you just cannot make informed judgements concerning performance degredation or improvement or on future growth.

The three parts presented here allow for a pretty good collection of data for trending etc. I know the subject isn't particularly exciting and probably doesn't appeal to many!! I have no reason to believe that I won't carry this on to SQL Server 2005, true there's lots of management views, but you need hard hsitorical data for analysis if your server falls over.

I hope I've explained it in such a manner it can be understood, catch me at a user group meeting if you need any further clarification.

I have three standard sets of data I collect as a minimum, here is the third.

This is by far the largest post and contains the most amount of code. Again I’m not claiming the concepts behind this post are unique, far from it, but I’ve never seen anyone put all of it together as a complete solution.

So what am I talking about now? I collect perfmon stats and other counters in a sql server table which I also rollup. The very wide report shown below is an indication of what I collect.

For those not in the know master..sysperfinfo stores all the sql server perfmon counter data which is constantly updated, it’s not all in the easiest format to understand but nevertheless it’s all there. All I do is extract the information every hour, calculate average values for the hour and store them in a table. Every day I rollup the hourly data into daily averages which I store in separate tables.

Keeping a history of all this data is important for trending and prediction.

 

Disclaimer :-  The set of perfmon counters I chose were a personal choice, they are not “the best” , “the optimal” or anything like that. As I mentioned in part 2 of this series I have usually had the benefit of SQL Diagnostic Manager in the background.

 

In addition to the perfmon stats I also collect data from sp_monitor system procedure. It’s worth noting there is a bug in this procedure that means a value goes out of range .. I forget exactly which counter it is but if you get such an error all that is required is to recreate the procedure changing all integer variable to bigints.  Remember sp_monitor is a system procedure so you must use sp_ms_marksystemobject to return it to a system procedure.

 

I've had to split the report to allow me to snapshot it:-

 

 

 

 

 

 

 

 

The code presented isn’t  “generic” , for performance reasons, so you do need to make some modifications to the code for it to work, please note all the comments regarding these changes. Once again I use a DBA database, this time ServerAdmin, and I use scheduled jobs to store the data.

The raw data is stored every hour, I have two different summary tables, one which holds the average values for the entire day, the other which holds the average values for the key hours , say 08:00 to 17:00.

I have a job which creates the report shown above every night, I’m not publishing how I produce the reports, although I’ll try to remember to include the query,  as that would require yet another long blog. I hope to show my “reporting solution” one evening at a user group meeting.

( As a footnote I’ve been producing reports from SQL Server in this manner since 6.5 and this particular report for the past 3 or 4 years )

 

This is my documentation for this process, I’m hoping the formatting will convert correctly from word, I apologise for not sitting down and writing this in proper html but it’s a large document!

 

1.  Overview

  • The SQL Server Performance Monitor counters are stored internally within SQL Server in the table Sysperfinfo which is located within the master database. The purpose of this table is briefly documented within Books On Line.
    • The SQL Server Performance Counters are variously documented by Microsoft, a good source of reference is   “SQL Server 2000 Performance Tuning Technical  Reference”  – Microsoft Press -  ISBN 0-7356-1270-6
  • The available sql counters are shown below, a number of counters are generated for each database

 

 

Auto-Param Attempts/sec                      AWE lookup maps/sec                           AWE stolen maps/sec

AWE unmap calls/sec                             AWE unmap pages/sec                           AWE write maps/sec

Backup/Restore Throughput/sec            Batch Requests/sec                                 Bulk Copy Rows/sec

Bulk Copy Throughput/sec                    Cache Use Counts/sec                            Checkpoint pages/sec

DBCC Logical Scan Bytes/sec               Extent Deallocations/sec                        Extents Allocated/sec

Failed Auto-Params/sec                          Forwarded Records/sec                          Free list empty/sec

Free list requests/sec                               Free list stalls/sec                                    FreeSpace Page Fetches/sec

FreeSpace Scans/sec                               Full Scans/sec                                         Index Searches/sec

Latch Waits/sec                                       Lazy writes/sec                                       Lock Requests/sec

Lock Timeouts/sec                                  Lock Waits/sec                                       Log Bytes Flushed/sec

Log Cache Reads/sec                              Log Flush Waits/sec                               Log Flushes/sec

Logins/sec                                               Logouts/sec                                             Mixed page allocations/sec

Number of Deadlocks/sec                       Page Deallocations/sec                           Page lookups/sec

Page reads/sec                                         Page Splits/sec                                        Page writes/sec

Pages Allocated/sec                                 Probe Scans/sec                                       Range Scans/sec

Readahead pages/sec                               Safe Auto-Params/sec                            Scan Point Revalidations/sec

Shrink Data Movement Bytes/sec          Skipped Ghosted Records/sec                 SQL Compilations/sec

SQL Re-Compilations/sec                      Table Lock Escalations/sec                     Transactions/sec

Unsafe Auto-Params/sec                         Workfiles Created/sec                            Worktables Created/sec

 

  • A set of System Statistical Functions are also available which record data.

 

Function

Description ( from Books On Line )

@@CONNECTIONS

Returns the number of connections, or attempted connections, since SQL Server was last started. 

@@CPU_BUSY

Returns the time in milliseconds (based on the resolution of the system timer) that the CPU has spent working since SQL Server was last started.

@@IO_BUSY

Returns the time in milliseconds (based on the resolution of the system timer) that  SQL Server has spent performing input and output operations since it was last started.

@@PACK_RECEIVED

 

@@PACK_SENT

Returns the number of input packets read from the network by  SQL Server since last started.

Returns the number of output packets written to the network by  SQL Server since last started.

@@PACKET_ERRORS

Returns the number of network packet errors that have occurred on SQL Server connections since SQL Server was last started.

@@TOTAL_READ

 

@@TOTAL_WRITE

Returns the number of disk reads (not cache reads) by SQL Server since last started.

Returns the number of disk writes by SQL Server since last started.

@@TOTAL_ERRORS

Returns the number of disk read/write errors encountered by SQL Server since last started.

 

 

  • Technically the descriptions given in BOL are correct although in practice the descriptions are incomplete.
  • There is a system stored procedure  sp_monitor which works in conjunction with a system table spt_monitor , both can be found within the master database, the values returned by the statistical functions correctly return the values since the last execution of the stored procedure OR the start of the SQL Server service which ever was the most recent.
  • Knowing this fact we can extract the statistics for our own use by running sp_monitor at regular intervals.
  • Sp_monitor is a documented system procedure, examination of the procedure shows that it uses the table spt_monitor  ( which is not documented )
  • A typical result set from this table is shown below

lastrun

cpu_busy

io_busy

idle

pack_received

pack_sent

connections

pack_errors

total_read

total_write

total_errors

2006-01-25 09:01:03.703

22852929

735107

137159340

113903076

121546762

931224

532

9254818

64512368

0

 

 

2.     Overview of collection Process

 

·         To facilitate the collection process the functionality is handled by SQL Server in a user database.

·         Performance data is extracted with stored procedures

·         Performance data is stored in tables

·         Collection is achieved using the Agent Job Scheduler

 

A snapshot of the statistical data is taken at regular intervals ( every hour ), the average value(s) for that period is then calculated and written to a table.

The disadvantage of this approach is that it does not provide any granularity in as much as any extremes will be masked.

The advantage is that the number of data sets is manageable and the collection has a minimal impact upon the server, if at all.

Once a day the daily data is rolled up into an average for the day, this allows for simple day to day comparison.

Thus the data collected is adequate for viewing patterns and trends but less adequate for detailed analysis.

 

 

 

Overview of hourly statistic gathering job.

( The database name ServerDBA is used as an indicator to show object placement  )

 

 

3. Counters collected

 

Counter / Column

Value

Description

TheDate

Datetime – timestamp of table row entry

Date time. The values for the counters represent the average for the hour ending shown in this column

CacheHitRatio

%age value (average) for the time period

This is the total ratio for what is normally called the procedure cache.

Actually the cache holds plans for more than just procedures. ( see table 6 ) ( vii )

cpu_busy

Milliseconds total for the time period

( i )

This is a measure of cpu use and indicates the ms of use in the hour period

io_busy

Milliseconds total for the time period

( i )

This is a measure of i/o use and indicates the ms of use in the hour period

Logins

Number of connected users (average) for the time period

The number of persisted connection to the server. Each connection typically takes resource of around 48k memory. Connection memory cannot come from extended memory and thus unnecessary persisted connections can decrease the available memory for other processes.

Connections

Login attempts per second (average) for the time period

The rate of connections and connection attempts. Where connections to sql server are using pooling if this figure is high then pooling may not be configured correctly

PacketsReceived

Packets received per second (average) for the time period

Measure of network activity

PacketsSent

Packets sent per second (average) for the time period

Measure of network activity

PacketErrors

Total of packet errors for the time period

Should be nil

Reads

Reads per second (average) for the time period

Count of page reads executed by SQL Server. Indication of server activity.

Writes

Writes per second (average) for the time period

Count of page writes executed by SQL Server. Indication of server activity.

RWErrors

Total Number of read or write errors for the time period

Should be nil

TotalTransactions

Transactions per second (average) for the entire server for the time period

Measure of server activity

UserDB1_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

TempdbTransactions

Transactions per second (average) for the database for the time period

Measure of tempdb activity. Tempdb is the working area for SQL Server. High levels of transactions in tempdb relative to the overall server figure may be of interest ( ii )

FullScans

Number of unrestricted full scans per second  (average) for the time period

Number of unrestricted full scans per second. These can be either base-table or full-index scans. ( iii )

This figure should be low

IndexSearches

Number of index lookups per second (average) for the time period

This counter shows the number of index searches SQL Server is performing. SQL Server uses index searches to start range scans, to retrieve a specific record by using one index, and to reposition within an index.

PageSplits

Number of page splits per second (average)  when an index page overflows for the time period

This figure should be low.

( iv )

LockRequests

The total number of lock requests per second (average) for the time period

Locks have to exist within lower memory, excessive lock requests may result in memory issues, likewise other factors may reduce the available memory for locks

If locks become an area of concern greater granularity can be obtained by measuring lock types ( v )

LockTimeouts

The number of times per second (average)  that a lock could not be obtained when requested for the time period

Excessive lock timeouts may indicate application problems, memory pressures and/or excessive number of requests ( v )

Deadlocks

The total number of deadlocks that occurred during the monitored time period

This counter should be zero

LatchWaits

The number of waits per second (average) on latch requests.

Latches are lightweight “locks” used to aid performance. This figure should be low. A high figure would indicate resource issues, probably memory.

LockWaits

The number of times per second (average) that a thread had to wait for a lock during the monitored time period

This value should be low ( v ) if this counter is high it may indicate performance issues.

Compilations

The number of times per second (average) that sql compilations occur for the monitored  time period

Used as a measure of work

WorkFiles

The number of workfiles, such as created internally during query execution, per second (average) for the monitored  time period.

Used as a measure of work

( vi )

WorkTables

The number of worktables, such as created internally during sorts, unions, groupings, per second (average) for the monitored  time period.

Used as a measure of work

( vi )

HourCounter

Numeric sequence to provide a unique row identifier.

 

  

Notes:

 

(i)  Despite the documentation within BOL, KB813199 tells us that the actual number returned is measured in ticks not milliseconds ( or seconds ) to get the actual figure in milliseconds multiply the result by 32

( ii )      Tempdb is often overlooked when considering performance issues.

·          Analysis of the work within tempdb may indicate excessive creation of temporary tables, large numbers of sort and group by operations.

·          Activity within tempdb should be viewed along with workfiles and worktables.

·          A heavily used tempdb may slow the entire server.

 

( iii )     Full scans also include activity from the system databases and those that SQL Server initiates its self.

·          Whilst in some situations table scans may be more efficient this figure should be viewed for fluctuations.

·          There should be a ratio between the figures of  fullscans and index searches, say, 15% – 85%

 

( iv )     As with any server counter this includes page splits within system and tempdb databases.

·          Certain tools including profiler can distort this figure.

·          Page splits may be reduced by introducing a fill factor within indexes.

·          Too many page splits are bad, ideally this value should be very low but typically often isn’t.

·          A page split occurs when a new record is inserted within an existing page and there is no room. Note that updates can cause page splits.

·          Page splits are expensive in performance terms and are also likely to result in data fragmentation in the database.

 

( v )      There are 49 perfmon counters which provide information about locks.

·          These counters measure the SQL Server locks they do not directly measure locks in the sense of locking and blocking activity within user databases.

·          Use the counters together as a measure of server activity.

·          Lock Requests / second  =  locks requested by server processes

·