November 2006 - Posts

How much memory?

This is a question I would ask at interviews .. You have 12Gb of ram on your new SQL Server ..how do you get SQL Server to use it and how do you know if SQL Server is actually using the memory?

I've had some discussion on this matter with two reports raised claiming SQL Server was only using 200Mb of ram whereas it should be using much more.

http://msdn2.microsoft.com/en-us/library/aa196705(SQL.80).aspx   is where to look.

For my interview the answers were , enable awe in sql server, set maximum memory, check target memory in sysperfinfo ( there's also a perfmon counter but DBAs prefer SQL solutions! ) The wrong answer was do nothing and use Task Manager. ( with w2k you also need /PAE .. w2k3 should have this switch by default ).

select * from master.dbo.sysperfinfo where counter_name like '%server memory%'

should show something like this:-

object_name                                      counter_name                                    instance_name           cntr_value     cntr_type  

---------------------------------------------------------------------------------------------------------------------------------------------------

SQLServer:Memory Manager          Target Server Memory(KB)                                                  11444552      65536

SQLServer:Memory Manager          Total Server Memory (KB)                                                   11444552      65536

 

(2 row(s) affected)

It's also worth noting that awe can be pretty nasty if you don't limit max memory, I won't bother to repeat the kb article.  

 

 

 

Posted by GrumpyOldDBA with 1 comment(s)
Filed under:

Not thinking it through

I’m looking into performance problems on a box which is in a managed data centre, so there is the question of ownership of problems etc. One current issue has been an index rebuild job which has been regularly failing. The job is a simple cursor loop that applies a dbcc dbreindex to each table in turn, the database is in full recovery mode and tran logs get backed up hourly, it’s effectively a 7 x 24 system so options of changing database mode are not applicable. The server is also, in my view, short of disk space, we can only store one backup of the main database which is about 20gb in size, and the space available for transaction logs is also limited.

There were a number of server configuration changes to the out of the box settings, one which still exists is a setting to the Query Governor. Upon investigation it was discovered that the query governor setting was responsible for the failure of the dbcc dbreindex job.

Once this was resolved the reindex of the tables ran the transaction log drive out of space! I’m not going to relate how that was resolved, let’s just say the production database now has a different dbid.

The solution to this issue was to add a second loop inside the dbreindex loop to perform a transaction log backup every 10 or 20 tables, sounds fair, however, the log backup command is with an init and the time stamp for the file name is of the usual format  dbname_log_yyymmddhhmm.bak, problem is if the dbreindex  hits a batch of small tables the loop executes in less than a minute, this means the next log file overwrites the previous file and we have lost the continuity of the transaction logs and we can’t recover the database.

This was “tested” a number of times on the production server, including generating the logs without actually applying the reindex, 20 transaction log backups all with the same name in under a minute. As there were no before and after backups of the database, not enough disk space, the routine destroyed the recovery point until the next full backup – which didn’t occur until 36 hours later, putting us at risk of loss of a day’s business.

The Query Governor has been increased enough to presumably allow the job to run, although as we haven't had a complete run of the job data changes since may of course cause the job to fail next time it runs.

So the moral of this story is to beware changing server settings, you might not realise what is affected and think through clearly the logic of coding changes.

Posted by GrumpyOldDBA with 3 comment(s)
Filed under:

Transactional Replication by T-SQL

Haven't forgotten about this but we're still considering which way we want to implement replication. It's proving very interesting and I will publish my findings.
Posted by GrumpyOldDBA with no comments
Filed under:

P & O tools - missing script

Seems I lost the tail end of my posting - the blog timed out in the middle of writing the post, sorry.

This is the script to use against the procedure cache, it returns information for one database only and should be run in the master database

select bucketid,cacheobjtype,objtype,objid,name,refcounts,usecounts,syscacheobjects.uid,setopts,sqlbytes,sql
from master.dbo.syscacheobjects with (nolock) left outer join
mydatabase.dbo.sysobjects with (nolock)
on objid=id
where dbid=mydatabase_id 
order by objid

substitute your database name and id in the appropriate place in the code. There's a function on another web site which will turn the numeric setopts values back into the SET options, I was unable to fully validate this function.

Posted by GrumpyOldDBA with no comments
Filed under:

Creating a baseline for P & O part 3

The most important part of performance tuning is to be able to establish a baseline. Without a baseline and on-going statistics and trending information you just cannot make informed judgements concerning performance degredation or improvement or on future growth.

The three parts presented here allow for a pretty good collection of data for trending etc. I know the subject isn't particularly exciting and probably doesn't appeal to many!! I have no reason to believe that I won't carry this on to SQL Server 2005, true there's lots of management views, but you need hard hsitorical data for analysis if your server falls over.

I hope I've explained it in such a manner it can be understood, catch me at a user group meeting if you need any further clarification.

I have three standard sets of data I collect as a minimum, here is the third.

This is by far the largest post and contains the most amount of code. Again I’m not claiming the concepts behind this post are unique, far from it, but I’ve never seen anyone put all of it together as a complete solution.

So what am I talking about now? I collect perfmon stats and other counters in a sql server table which I also rollup. The very wide report shown below is an indication of what I collect.

For those not in the know master..sysperfinfo stores all the sql server perfmon counter data which is constantly updated, it’s not all in the easiest format to understand but nevertheless it’s all there. All I do is extract the information every hour, calculate average values for the hour and store them in a table. Every day I rollup the hourly data into daily averages which I store in separate tables.

Keeping a history of all this data is important for trending and prediction.

 

Disclaimer :-  The set of perfmon counters I chose were a personal choice, they are not “the best” , “the optimal” or anything like that. As I mentioned in part 2 of this series I have usually had the benefit of SQL Diagnostic Manager in the background.

 

In addition to the perfmon stats I also collect data from sp_monitor system procedure. It’s worth noting there is a bug in this procedure that means a value goes out of range .. I forget exactly which counter it is but if you get such an error all that is required is to recreate the procedure changing all integer variable to bigints.  Remember sp_monitor is a system procedure so you must use sp_ms_marksystemobject to return it to a system procedure.

 

I've had to split the report to allow me to snapshot it:-

 

 

 

 

 

 

 

 

The code presented isn’t  “generic” , for performance reasons, so you do need to make some modifications to the code for it to work, please note all the comments regarding these changes. Once again I use a DBA database, this time ServerAdmin, and I use scheduled jobs to store the data.

The raw data is stored every hour, I have two different summary tables, one which holds the average values for the entire day, the other which holds the average values for the key hours , say 08:00 to 17:00.

I have a job which creates the report shown above every night, I’m not publishing how I produce the reports, although I’ll try to remember to include the query,  as that would require yet another long blog. I hope to show my “reporting solution” one evening at a user group meeting.

( As a footnote I’ve been producing reports from SQL Server in this manner since 6.5 and this particular report for the past 3 or 4 years )

 

This is my documentation for this process, I’m hoping the formatting will convert correctly from word, I apologise for not sitting down and writing this in proper html but it’s a large document!

 

1.  Overview

  • The SQL Server Performance Monitor counters are stored internally within SQL Server in the table Sysperfinfo which is located within the master database. The purpose of this table is briefly documented within Books On Line.
    • The SQL Server Performance Counters are variously documented by Microsoft, a good source of reference is   “SQL Server 2000 Performance Tuning Technical  Reference”  – Microsoft Press -  ISBN 0-7356-1270-6
  • The available sql counters are shown below, a number of counters are generated for each database

 

 

Auto-Param Attempts/sec                      AWE lookup maps/sec                           AWE stolen maps/sec

AWE unmap calls/sec                             AWE unmap pages/sec                           AWE write maps/sec

Backup/Restore Throughput/sec            Batch Requests/sec                                 Bulk Copy Rows/sec

Bulk Copy Throughput/sec                    Cache Use Counts/sec                            Checkpoint pages/sec

DBCC Logical Scan Bytes/sec               Extent Deallocations/sec                        Extents Allocated/sec

Failed Auto-Params/sec                          Forwarded Records/sec                          Free list empty/sec

Free list requests/sec                               Free list stalls/sec                                    FreeSpace Page Fetches/sec

FreeSpace Scans/sec                               Full Scans/sec                                         Index Searches/sec

Latch Waits/sec                                       Lazy writes/sec                                       Lock Requests/sec

Lock Timeouts/sec                                  Lock Waits/sec                                       Log Bytes Flushed/sec

Log Cache Reads/sec                              Log Flush Waits/sec                               Log Flushes/sec

Logins/sec                                               Logouts/sec                                             Mixed page allocations/sec

Number of Deadlocks/sec                       Page Deallocations/sec                           Page lookups/sec

Page reads/sec                                         Page Splits/sec                                        Page writes/sec

Pages Allocated/sec                                 Probe Scans/sec                                       Range Scans/sec

Readahead pages/sec                               Safe Auto-Params/sec                            Scan Point Revalidations/sec

Shrink Data Movement Bytes/sec          Skipped Ghosted Records/sec                 SQL Compilations/sec

SQL Re-Compilations/sec                      Table Lock Escalations/sec                     Transactions/sec

Unsafe Auto-Params/sec                         Workfiles Created/sec                            Worktables Created/sec

 

  • A set of System Statistical Functions are also available which record data.

 

Function

Description ( from Books On Line )

@@CONNECTIONS

Returns the number of connections, or attempted connections, since SQL Server was last started. 

@@CPU_BUSY

Returns the time in milliseconds (based on the resolution of the system timer) that the CPU has spent working since SQL Server was last started.

@@IO_BUSY

Returns the time in milliseconds (based on the resolution of the system timer) that  SQL Server has spent performing input and output operations since it was last started.

@@PACK_RECEIVED

 

@@PACK_SENT

Returns the number of input packets read from the network by  SQL Server since last started.

Returns the number of output packets written to the network by  SQL Server since last started.

@@PACKET_ERRORS

Returns the number of network packet errors that have occurred on SQL Server connections since SQL Server was last started.

@@TOTAL_READ

 

@@TOTAL_WRITE

Returns the number of disk reads (not cache reads) by SQL Server since last started.

Returns the number of disk writes by SQL Server since last started.

@@TOTAL_ERRORS

Returns the number of disk read/write errors encountered by SQL Server since last started.

 

 

  • Technically the descriptions given in BOL are correct although in practice the descriptions are incomplete.
  • There is a system stored procedure  sp_monitor which works in conjunction with a system table spt_monitor , both can be found within the master database, the values returned by the statistical functions correctly return the values since the last execution of the stored procedure OR the start of the SQL Server service which ever was the most recent.
  • Knowing this fact we can extract the statistics for our own use by running sp_monitor at regular intervals.
  • Sp_monitor is a documented system procedure, examination of the procedure shows that it uses the table spt_monitor  ( which is not documented )
  • A typical result set from this table is shown below

lastrun

cpu_busy

io_busy

idle

pack_received

pack_sent

connections

pack_errors

total_read

total_write

total_errors

2006-01-25 09:01:03.703

22852929

735107

137159340

113903076

121546762

931224

532

9254818

64512368

0

 

 

2.     Overview of collection Process

 

·         To facilitate the collection process the functionality is handled by SQL Server in a user database.

·         Performance data is extracted with stored procedures

·         Performance data is stored in tables

·         Collection is achieved using the Agent Job Scheduler

 

A snapshot of the statistical data is taken at regular intervals ( every hour ), the average value(s) for that period is then calculated and written to a table.

The disadvantage of this approach is that it does not provide any granularity in as much as any extremes will be masked.

The advantage is that the number of data sets is manageable and the collection has a minimal impact upon the server, if at all.

Once a day the daily data is rolled up into an average for the day, this allows for simple day to day comparison.

Thus the data collected is adequate for viewing patterns and trends but less adequate for detailed analysis.

 

 

 

Overview of hourly statistic gathering job.

( The database name ServerDBA is used as an indicator to show object placement  )

 

 

3. Counters collected

 

Counter / Column

Value

Description

TheDate

Datetime – timestamp of table row entry

Date time. The values for the counters represent the average for the hour ending shown in this column

CacheHitRatio

%age value (average) for the time period

This is the total ratio for what is normally called the procedure cache.

Actually the cache holds plans for more than just procedures. ( see table 6 ) ( vii )

cpu_busy

Milliseconds total for the time period

( i )

This is a measure of cpu use and indicates the ms of use in the hour period

io_busy

Milliseconds total for the time period

( i )

This is a measure of i/o use and indicates the ms of use in the hour period

Logins

Number of connected users (average) for the time period

The number of persisted connection to the server. Each connection typically takes resource of around 48k memory. Connection memory cannot come from extended memory and thus unnecessary persisted connections can decrease the available memory for other processes.

Connections

Login attempts per second (average) for the time period

The rate of connections and connection attempts. Where connections to sql server are using pooling if this figure is high then pooling may not be configured correctly

PacketsReceived

Packets received per second (average) for the time period

Measure of network activity

PacketsSent

Packets sent per second (average) for the time period

Measure of network activity

PacketErrors

Total of packet errors for the time period

Should be nil

Reads

Reads per second (average) for the time period

Count of page reads executed by SQL Server. Indication of server activity.

Writes

Writes per second (average) for the time period

Count of page writes executed by SQL Server. Indication of server activity.

RWErrors

Total Number of read or write errors for the time period

Should be nil

TotalTransactions

Transactions per second (average) for the entire server for the time period

Measure of server activity

UserDB1_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

UserDB2_Transactions

Transactions per second (average) for the database for the time period

Measure of database activity

TempdbTransactions

Transactions per second (average) for the database for the time period

Measure of tempdb activity. Tempdb is the working area for SQL Server. High levels of transactions in tempdb relative to the overall server figure may be of interest ( ii )

FullScans

Number of unrestricted full scans per second  (average) for the time period

Number of unrestricted full scans per second. These can be either base-table or full-index scans. ( iii )

This figure should be low

IndexSearches

Number of index lookups per second (average) for the time period

This counter shows the number of index searches SQL Server is performing. SQL Server uses index searches to start range scans, to retrieve a specific record by using one index, and to reposition within an index.

PageSplits

Number of page splits per second (average)  when an index page overflows for the time period

This figure should be low.

( iv )

LockRequests

The total number of lock requests per second (average) for the time period

Locks have to exist within lower memory, excessive lock requests may result in memory issues, likewise other factors may reduce the available memory for locks

If locks become an area of concern greater granularity can be obtained by measuring lock types ( v )

LockTimeouts

The number of times per second (average)  that a lock could not be obtained when requested for the time period

Excessive lock timeouts may indicate application problems, memory pressures and/or excessive number of requests ( v )

Deadlocks

The total number of deadlocks that occurred during the monitored time period

This counter should be zero

LatchWaits

The number of waits per second (average) on latch requests.

Latches are lightweight “locks” used to aid performance. This figure should be low. A high figure would indicate resource issues, probably memory.

LockWaits

The number of times per second (average) that a thread had to wait for a lock during the monitored time period

This value should be low ( v ) if this counter is high it may indicate performance issues.

Compilations

The number of times per second (average) that sql compilations occur for the monitored  time period

Used as a measure of work

WorkFiles

The number of workfiles, such as created internally during query execution, per second (average) for the monitored  time period.

Used as a measure of work

( vi )

WorkTables

The number of worktables, such as created internally during sorts, unions, groupings, per second (average) for the monitored  time period.