Identifying the hardware or virtualised resource to satisfy your workload isn’t straightforward, different data access patterns requires different resourcing, for example 1,000 users on an Online Transaction Processing (OLTP) workload may require less resource when compared to that of an Online Analytical Processing (OLAP) workload with 5 users. This blog post will give some guidance on how to approach specifying resource requirements through server workload analysis.

Writing an Service Level Agreement (SLA) is a good method of defining what is acceptable for a user experience, in a database environment that SLA is not only based on system availability but also on the amount of time a query should return its results, in an OLAP environment it is difficult to predict the type of queries that will be executed so a set number of typical queries should be identified and base lined. The metrics that form the base line can be taken from perfmon, the specific counters to record and review are as follows:

IO
Average Disk Bytes per Read
Average Disk Reads per Second
Average Disk Bytes per Write
Average Disk Writes per Second

CPU

% Processor Time

The Read and Write disk queues are not important because they indicate performance of the existing IO sub-system rather than understanding the workload resource requirements of the typical queries being base lined.

How SQL Server Interacts with the Hardware

To understand how to estimate the resource requirements for your SQL Server workload the basics of SQL Servers architecture need to be understood. The sections below will give a high level overview of the important hardware interacting components.

Permanent Storage Access

Wesley Brown gives a good introduction to SQL Server Data Access Patterns in his article “Fundamentals of Storage Systems – Capturing IO Patterns” on SQL Server Central. The SQL Server Best Practices whitepaper by Mike Ruthruff, Mchael Thomassy and Prem Mehra is also a good read although applicable to 2005 it’s relevant for 2008 R2.

Write transactions use Write-ahead Logging which means the rows from an INSERT statement are written to the transaction log first, the IO request is synchronously written to permanent storage so will not allow the statement to complete (commit the individual statement unit of work) until it is physically hardened off to the write media, the write to the data file occurs later asynchronously performed by either the checkpoint or lazywriter processes.

The placement of the transaction log is critical to the success of a scalable SQL Server installation – the transaction log must be sited on storage that has a disk write latency of less than 7 milliseconds per IO, that is for every 512 bytes – 62Kbytes write, it should take no longer than 7 milliseconds to be written to permanent storage otherwise performance will suffer, as an illustration of this consider an INSERT statement that writes 500Mbytes of data to the transaction log, the maximum IO size that SQL Server can write to the log is 62Kbytes, so the 500Mbytes of data will require 512,000 / 62 = 8,258 IO’s. If each IO takes 15 milliseconds the total time of the INSERT will be in the region of 123 seconds compared to 57 seconds if the latency was 7 milliseconds.

CPU

Gabriel Torres gives a detailed explanation in his tutorial on “How the Memory Cache works” on Hardware Secrets.

The path a query takes can be seen using these basic steps:

1.       Compile

2.       Bring data from Permanent Storage into Memory: data is transferred from the storage device using SATA or SCIS protocols or in the case of IOFusion and OCZ IBIS drives directly across the PCI bus into Memory.

3.       A CPU core or core’s in the case of data stream parallelism requests data from Memory which travels to the Ln cache and the CPU processes the data.

4.       Query output is produced, for instance, results sent via the network, a new temporary table produced etc.

Concurrently executing connections even if running the same query will all need to process the data, there is no sharing of the results, in SQL Server there are optimisations to share the data in memory for instance if several connections are scanning the same table they will use the read ahead more optimally through using data already in cache rather than first going to permanent storage.

Example:

SELECT SUM( transaction_amount )
FROM Transactions

The query above reads 10,000,000 transactions from the Transactions table and provides the sum of transaction_amount, the table is 20GBytes and because the entire table is being aggregated and there is no special index on transaction_amount the entire 20GBytes needs to be read from permanent storage into memory and through the CPU to render a result.

1 connection concurrently running the query will require 20GBytes to be processed by the CPU
2 connections concurrently running the query will require 40GBytes to be processed by the CPU
10 connections concurrently running the query will require 200GBytes to be processed by the CPU

CPU becomes a scaling issue when large amounts of data are being processed and the more concurrent users that are doing that the more CPU capacity needed.

The query above would likely be streamed onto separate cores depending on resource availability in order to optimise the load.

As an aside, vendors are looking toward utilising Graphical Processing Units (GPU) to provide significant speed improvements; Peter Bakkum and Kevin Skadron discuss this in their paper on “Accelerating SQL Database Operations on a GPU with CODA”.

Memory

In simple terms SQL Server’s memory is split into Data and Procedure Cache, the former holds data, the latter the execution plans of queries. The Procedure Cache may grow to several GBytes in particular scenarios but later editions of SQL Server have this behaviour better tuned so it no longer has such an adverse affect on the Data cache.

The Data cache is SQL Server instance specific and shared by all databases on that instance. Consider the aggregation query against the Transactions table in the CPU section above, SQL Server needs to read 20GBytes of data in order to render a result, in order to do that the 20GBytes needs to be loaded into the SQL Server data cache and as a result other data pages will likely be removed from cache based on a Least Recently Used algorithm, that can incur a penalty because the pages being removed may have been updated (termed dirty pages) so need to be written to the permanent storage device.

Determining the Workload

Online Transaction Processing (OLTP) transactions are predominately seek operations where the seek reads or writes a small amount of data usually in the order of < 1Mbyte, that data would likely be dispersed across the data file so SQL Server will do a number of random 8Kbyte reads, an example would be retrieving the details of a specific order or inserting a new order. Online Analytical Processing (OLAP) transactions are predominately a mixture of seek and scan operations, the scan operations are typically used to access the large “fact” tables for example granularity at the orders level, the seek operations are used to access the smaller dimension or meta data tables for example customer detail, the access pattern would be a mix of 8Kbyte random reads as well as larger reads in the range of 64Kbyte and 1MByte by the read ahead manager.

Ordinarily OLTP systems require very low latency (sub 3 milliseconds) on the transaction log because the INSERT and UPDATE queries need to complete extremely quickly otherwise blocking can occur or the additional resource put on the IO subsystem will cause the SLA response time metric to be exceeded. Because the data files (MDF and NDF files) are written asynchronously by the checkpoint and lazywriter processes low latency on small IO’s are not as important as they are for the transaction log, the IO subsystem will usually be better at larger IO reads and writes, typically in the range 8Kbytes to 1Mbytes.

Method of Estimation

1.       Make a list of typical operations across the timeline of a given day, break the operations down into either ad-hoc or set time operations further breaking those down into the following classifications:

a.       Intensive write: the operation writes large volumes of data thus requiring a lot of write activity to the transaction log.

b.      Intensive read: the operation reads large volumes of data thus requiring a lot of read activity, that activity probably being a mixture of sequential scan and random reads.

c.       Intensive read/write: a mixture of (a) and (b)

2.       For each operation determine the IO and CPU characteristics, that can be done using an isolated system using SQL Profiler to capture CPU, Reads and Writes and Perfmon to capture the counters mentioned earlier in this post.

3.       Across the timeline for the day determine when each operation is likely to run, Ad-hoc queries are difficult to determine but perhaps users are only doing them between 9am and 9pm for example.

4.       Expand (3) and estimate the worse case number of concurrent operations from your typical workload.

5.       Aggregate the figures up to work out the required number of IOs’ps and the required read/write permanent storage latency.

A rough resource requirement for your workload can now be used to understand the type of hardware and how it needs to be configured.

If you would like to comment on the above or would like me to perform the above analysis for your own system then drop me a line to tonyrogerson@torver.net.