Creating Baselines for P & O
The most important part of performance tuning is to be able to establish a baseline. Without a baseline and on-going statistics and trending information you just cannot make informed judgements concerning performance degredation or improvement or on future growth.
I have three standard sets of data I collect as a minimum, here is the first.
There's plenty of articles around concerning the use of fn_virtualfilestats so I'm going to concentrate on how I use this function and the data it generates.
Here's a typical call and result set:-
select * from ::fn_virtualfilestats (11,-1)
DbId FileId TimeStamp NumberReads NumberWrites BytesRead BytesWritten IoStallMS
------ ------ ----------- -------------------- -------------------- -------------------- -------------------- --------------------
11 1 179531484 58 798 2375680 7946240 389
11 2 179531484 8 10160 337408 11038720 0
I collect data every hour using a scheduled job and store it in a table
Here's a typical subset of data
time dbid fileid reads writes iostall
----------------- ---- ------ -------------------- -------------------- --------------------
Nov 2 2006 3:00 11 1 407245 71259 1166859
Nov 2 2006 3:00 11 2 10874 327285 34415
Nov 2 2006 4:00 11 1 407253 71259 1166984
Nov 2 2006 4:00 11 2 10874 327285 34415
Nov 2 2006 5:00 11 1 407263 71259 1167046
Nov 2 2006 5:00 11 2 10874 327285 34415
Nov 2 2006 6:00 11 1 407263 71259 1167046
Nov 2 2006 6:00 11 2 10874 327285 34415
Nov 2 2006 7:00 11 1 407357 71259 1167358
Nov 2 2006 7:00 11 2 10874 327285 34415
Nov 2 2006 8:00 11 1 407477 71259 1167700
Nov 2 2006 8:00 11 2 10874 327285 34415
Nov 2 2006 9:00 11 1 407591 71259 1167965
Nov 2 2006 9:00 11 2 10874 327285 34415
Nov 2 2006 10:00 11 1 407614 71259 1168090
Trouble is that the numbers are cumlative and thus I have to do subtraction to obtain the figures for the hour.
Well in principle this sounds easy but in reality if you're collecting data from say, 14 databases every hour, the query to extract that difference and the hour in which it occurred is "complex"
So I take the easy way out and use calculated columns with functions doing the maths, this gives me reuslts similar to this.
time dbid fileid reads writes iostall hourreads hourwrites
----------------- ---- ------ -------------------- -------------------- -------------------- -------------------- --------------------
Oct 13 2006 2:28 11 1 101620751 27122598 846102670 NULL NULL
Oct 13 2006 2:28 11 2 3042305 70473357 432497 NULL NULL
Oct 13 2006 4:00 11 1 101699390 27143997 846966601 43345 11543
Oct 13 2006 4:00 11 2 3042559 70551758 432513 183 52881
Oct 13 2006 5:00 11 1 101727442 27155013 847248234 28052 11016
Oct 13 2006 5:00 11 2 3042618 70588701 432513 59 36943
Oct 13 2006 6:00 11 1 101760009 27164425 847612578 32567 9412
Oct 13 2006 6:00 11 2 3042668 70618819 432529 50 30118
Oct 13 2006 7:00 11 1 101779677 27172467 847779295 19668 8042
Oct 13 2006 7:00 11 2 3042717 70660817 432529 49 41998
Oct 13 2006 9:00 11 1 101824133 27187730 848222714 12645 7634
Oct 13 2006 9:00 11 2 3042801 70725796 432529 41 31570
I have a job which publishes this data out to a web page every day and rolls up the previous weeks data into averages to give an overall view of i/o.
Points:-
- I sample every hour, generally this is fine, for heavy diagnostic work I might actually sample more often, say 10 mins.
- This type of data collection can help to pinpoint databases and times under load, if you have multiple files then you'll be able to see if the load on the files is balanced.
- Conversely it can help finding the quiet times to assist in locating maintenence windows in 24 x 7 systems.
- Once you have a few months of data the analysis can become more interesting.
- Functions in tables generally don't give good performance, however it does make things more simple.
- An index on column TheDate will aid query performance
- Ideally the data would be better extracted to a cube.
- Notwithstanding the performance hit you can do some really cool things with functions as calculated columns, including joins and/or lookups to other tables.
- I wouldn't recommend this for high performance oltp databases but for reporting and such it is really very handy. I've been using this technique since sql 2000 was released.
To populate the table I have a job set to run hourly which would have a step such as this:-
--
insert into ServerAdmin.dbo.Filestats(DbID,FileID,StampTime,Reads,Writes,BytesRead,BytesWrite,ioStall)
select * from ::fn_virtualfilestats (7,-1)
--
insert into ServerAdmin.dbo.Filestats(DbID,FileID,StampTime,Reads,Writes,BytesRead,BytesWrite,ioStall)
select * from ::fn_virtualfilestats (10,-1)
--
insert into ServerAdmin.dbo.Filestats(DbID,FileID,StampTime,Reads,Writes,BytesRead,BytesWrite,ioStall)
select * from ::fn_virtualfilestats (11,-1)
--
insert into ServerAdmin.dbo.Filestats(DbID,FileID,StampTime,Reads,Writes,BytesRead,BytesWrite,ioStall)
select * from ::fn_virtualfilestats (2,-1)
--
I keep this simple, no generic or dynamic coding, note that I always collect stats for Tempdb.
Here's part of a typical report and the code for the tables and functions, the functions must be created prior to the table creation.

Table:-
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[FileStats]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[FileStats]
GO
CREATE TABLE [dbo].[FileStats] (
[NumKey] [int] IDENTITY (1, 1) NOT NULL ,
[TheDate] [datetime] NULL ,
[DbID] [tinyint] NULL ,
[FileID] [tinyint] NULL ,
[StampTime] [bigint] NULL ,
[Reads] [bigint] NULL ,
[Writes] [bigint] NULL ,
[BytesRead] [bigint] NULL ,
[BytesWrite] [bigint] NULL ,
[ioStall] [bigint] NULL ,
[DbName] AS (db_name([DbID])) ,
[HourWrites] AS ([dbo].[fn_FileStats_Write]([TheDate], [dbid], [fileid], [Writes])) ,
[HourReads] AS ([dbo].[fn_FileStats_Read]([TheDate], [dbid], [fileid], [Reads])) ,
[HourIo] AS ([dbo].[fn_FileStats_IOStall]([TheDate], [dbid], [fileid], [iostall])) ,
[AvIoStall] AS ([dbo].[fn_FileStats_AvIoStall2]([TheDate], [dbid], [fileid], [Reads], [Writes], [ioStall]))
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[FileStats] ADD
CONSTRAINT [DF__FileStats__TheDate] DEFAULT (getdate()) FOR [TheDate],
CONSTRAINT [uk_FileStats_Numkey] UNIQUE NONCLUSTERED
(
[NumKey]
) ON [PRIMARY]
GO
--
--
--
Functions:-
--
--
CREATE function [dbo].[fn_FileStats_Write] (@p1 smalldatetime,@p2 tinyint,@p3 tinyint,@p4 bigint)
-- =============================================================
-- Function: fn_FileStats_Write
-- Written by: Colin Leversuch-Roberts
-- www.kelemconsulting.co.uk
-- (c) 6th Jan 2004
--
-- Purpose: Populates a computed column in table FileStats
--
-- System: DBA maintenance
--
-- Input Paramters: @p1 smalldatetime, @p2 tinyint, @p3 tinyint, @p4 bigint
-- row timestamp, database id, file id, number of writes
--
-- Returns : bigint
--
-- Usage: Computed column in table { select dbo.fn_FileStats_Write(@p1,@p2,@p3,@p4) }
--
-- Notes: This function calculates the number of writes recorded against the file id
-- for the last hour. The writes, reads and i/o stall figures are cumlative hence
-- the calulations required to work out the figures
--
-- VERSION HISTORY
-- Version No Date Description
-- 1 6 jan - 2004 Initial Release
-- =============================================================
returns bigint
as
begin
declare @Return bigint,@Writes bigint
select @Writes=Writes
from dbo.filestats with (nolock)
where convert(char(15),theDate,113)+'00'=convert(char(15),dateadd(hh,-1,@p1),113)+'00'
and dbid=@p2 and fileid=@p3
IF @p4=@Writes
set @return=0
ELSE
set @Return=abs(@p4-@Writes)
--endif
return(@Return)
end
--
--
--
CREATE function [dbo].[fn_FileStats_Read] (@p1 smalldatetime,@p2 tinyint,@p3 tinyint,@p4 bigint)
-- =============================================================
-- Function: fn_FileStats_Read
-- Written by: Colin Leversuch-Roberts
-- www.kelemconsulting.co.uk
-- (c) 6th Jan 2004
--
-- Purpose: Populates a computed column in table FileStats
--
-- System: DBA maintenance
--
-- Input Paramters: @p1 smalldatetime, @p2 tinyint, @p3 tinyint, @p4 bigint
-- row timestamp, database id, file id, number of writes
--
-- Returns : bigint
--
-- Usage: Computed column in table { select dbo.fn_FileStats_Read(@p1,@p2,@p3,@p4) }
--
-- Notes: This function calculates the number of reads recorded against the file id
-- for the last hour. The writes, reads and i/o stall figures are cumlative hence
-- the calulations required to work out the figures
--
-- VERSION HISTORY
-- Version No Date Description
-- 1 6 jan - 2004 Initial Release
-- =============================================================
returns bigint
as
begin
declare @Return bigint,@Reads bigint
select @Reads=Reads
from dbo.filestats with (nolock)
where convert(char(15),theDate,113)+'00'=convert(char(15),dateadd(hh,-1,@p1),113)+'00'
and dbid=@p2 and fileid=@p3
-- have to cope with divide by zero problem -- rats! makes code less neat!!
IF @p4=@Reads
set @return=0
ELSE
set @Return=abs(@p4-@Reads)
--endif
return(@Return)
end
--
--
--
CREATE function [dbo].[fn_FileStats_IOStall] (@p1 smalldatetime,@p2 tinyint,@p3 tinyint,@p4 bigint)
-- =============================================================
-- Function: fn_FileStats_IOStall
-- Written by: Colin Leversuch-Roberts
-- www.kelemconsulting.co.uk
-- (c) 6th Jan 2004
--
-- Purpose: Populates a computed column in table FileStats
--
-- System: DBA maintenance
--
-- Input Paramters: @p1 smalldatetime, @p2 tinyint, @p3 tinyint, @p4 bigint
-- row timestamp, database id, file id, number of writes
--
-- Returns : bigint
--
-- Usage: Computed column in table { select dbo.fn_FileStats_IOStall(@p1,@p2,@p3,@p4) }
--
-- Notes: This function calculates the iostall in ms recorded against the file id
-- for the last hour. The writes, reads and i/o stall figures are cumlative hence
-- the calulations required to work out the figures
--
-- VERSION HISTORY
-- Version No Date Description
-- 1 6 jan - 2004 Initial Release
-- =============================================================
returns bigint
as
begin
declare @Return bigint,@IOStall bigint
select @IOStall=IOStall
from dbo.filestats with (nolock)
where convert(char(15),theDate,113)+'00'=convert(char(15),dateadd(hh,-1,@p1),113)+'00'
and dbid=@p2 and fileid=@p3
-- have to cope with divide by zero problem -- rats! makes code less neat!!
IF @p4=@IOStall
set @return=0
ELSE
set @Return=abs(@p4-@IOStall)
--endif
return(@Return)
end
--
--
--
CREATE function dbo.fn_FileStats_AvIoStall2 (@p1 datetime,@p2 tinyint, @p3 tinyint,@p4 bigint,@p5 bigint,@p6 bigint)
-- =============================================================
-- Function: fn_FileStats_AvIoStall2
-- Written by: (c) Colin Leversuch-Roberts
-- www.kelemconsulting.co.uk
--
-- Purpose: to produce an average time in ms for the file io
--
-- System: DBA maintenance
--
-- Input Paramters: @p1 datetime, @p2 tinyint, @p3 tinyint,@p4 bigint, @p5 bigint, @p6 bigint
-- datetime , database id, file id, reads, writes, io stall
--
-- Returns : decimal with 4 decimal points
--
-- Usage: select dbo.fn_FileStats_AvIoStall2(@p1,@p2,@p3,@p4,@p5,@p6)
--
-- Notes: function to calculate the average i/o stall per i/o for the last hour
-- Can't do a direct function on the calculated columns as you
-- can't use computed columns in a computed column, so do this way.
--
-- VERSION HISTORY
-- Version No Date Description
-- 1 25-aug-2005 Initial Release
--
-- =============================================================
returns decimal(9,4)
as
begin
declare @reads bigint,@writes bigint,@iostall bigint,@return decimal(9,4)
--
select @reads = dbo.fn_FileStats_Read(@p1,@p2,@p3,@p4)
select @writes = dbo.fn_FileStats_Write(@p1,@p2,@p3,@p5)
select @iostall = dbo.fn_FileStats_IOStall(@p1,@p2,@p3,@p6)
--
IF @reads=0 and @writes=0
set @return = 0
ELSE
set @return = abs(((@iostall*1.0)/(@reads+@writes)))
--endif
--
return(@return)
end
--
--
PS .. still struggling with creating blog entries , please accept my apologies for poor formatting!!