Identifying the hardware or virtualised resource to satisfy your workload isn’t straightforward, different data access patterns requires different resourcing, for example 1,000 users on an Online Transaction Processing (OLTP) workload may require less resource when compared to that of an Online Analytical Processing (OLAP) workload with 5 users. This blog post will give some guidance on how to approach specifying resource requirements through server workload analysis.

Writing an Service Level Agreement (SLA) is a good method of defining what is acceptable for a user experience, in a database environment that SLA is not only based on system availability but also on the amount of time a query should return its results, in an OLAP environment it is difficult to predict the type of queries that will be executed so a set number of typical queries should be identified and base lined. The metrics that form the base line can be taken from perfmon, the specific counters to record and review are as follows:

IO
Average Disk Bytes per Read
Average Disk Reads per Second
Average Disk Bytes per Write
Average Disk Writes per Second

CPU

% Processor Time

The Read and Write disk queues are not important because they indicate performance of the existing IO sub-system rather than understanding the workload resource requirements of the typical queries being base lined.

How SQL Server Interacts with the Hardware

To understand how to estimate the resource requirements for your SQL Server workload the basics of SQL Servers architecture need to be understood. The sections below will give a high level overview of the important hardware interacting components.

Permanent Storage Access

Wesley Brown gives a good introduction to SQL Server Data Access Patterns in his article “Fundamentals of Storage Systems – Capturing IO Patterns” on SQL Server Central. The SQL Server Best Practices whitepaper by Mike Ruthruff, Mchael Thomassy and Prem Mehra is also a good read although applicable to 2005 it’s relevant for 2008 R2.

Write transactions use Write-ahead Logging which means the rows from an INSERT statement are written to the transaction log first, the IO request is synchronously written to permanent storage so will not allow the statement to complete (commit the individual statement unit of work) until it is physically hardened off to the write media, the write to the data file occurs later asynchronously performed by either the checkpoint or lazywriter processes.

The placement of the transaction log is critical to the success of a scalable SQL Server installation – the transaction log must be sited on storage that has a disk write latency of less than 7 milliseconds per IO, that is for every 512 bytes – 62Kbytes write, it should take no longer than 7 milliseconds to be written to permanent storage otherwise performance will suffer, as an illustration of this consider an INSERT statement that writes 500Mbytes of data to the transaction log, the maximum IO size that SQL Server can write to the log is 62Kbytes, so the 500Mbytes of data will require 512,000 / 62 = 8,258 IO’s. If each IO takes 15 milliseconds the total time of the INSERT will be in the region of 123 seconds compared to 57 seconds if the latency was 7 milliseconds.

CPU

Gabriel Torres gives a detailed explanation in his tutorial on “How the Memory Cache works” on Hardware Secrets.

The path a query takes can be seen using these basic steps:

1.       Compile

2.       Bring data from Permanent Storage into Memory: data is transferred from the storage device using SATA or SCIS protocols or in the case of IOFusion and OCZ IBIS drives directly across the PCI bus into Memory.

3.       A CPU core or core’s in the case of data stream parallelism requests data from Memory which travels to the Ln cache and the CPU processes the data.

4.       Query output is produced, for instance, results sent via the network, a new temporary table produced etc.

Concurrently executing connections even if running the same query will all need to process the data, there is no sharing of the results, in SQL Server there are optimisations to share the data in memory for instance if several connections are scanning the same table they will use the read ahead more optimally through using data already in cache rather than first going to permanent storage.

Example:

SELECT SUM( transaction_amount )
FROM Transactions

The query above reads 10,000,000 transactions from the Transactions table and provides the sum of transaction_amount, the table is 20GBytes and because the entire table is being aggregated and there is no special index on transaction_amount the entire 20GBytes needs to be read from permanent storage into memory and through the CPU to render a result.

1 connection concurrently running the query will require 20GBytes to be processed by the CPU
2 connections concurrently running the query will require 40GBytes to be processed by the CPU
10 connections concurrently running the query will require 200GBytes to be processed by the CPU

CPU becomes a scaling issue when large amounts of data are being processed and the more concurrent users that are doing that the more CPU capacity needed.

The query above would likely be streamed onto separate cores depending on resource availability in order to optimise the load.

As an aside, vendors are looking toward utilising Graphical Processing Units (GPU) to provide significant speed improvements; Peter Bakkum and Kevin Skadron discuss this in their paper on “Accelerating SQL Database Operations on a GPU with CODA”.

Memory

In simple terms SQL Server’s memory is split into Data and Procedure Cache, the former holds data, the latter the execution plans of queries. The Procedure Cache may grow to several GBytes in particular scenarios but later editions of SQL Server have this behaviour better tuned so it no longer has such an adverse affect on the Data cache.

The Data cache is SQL Server instance specific and shared by all databases on that instance. Consider the aggregation query against the Transactions table in the CPU section above, SQL Server needs to read 20GBytes of data in order to render a result, in order to do that the 20GBytes needs to be loaded into the SQL Server data cache and as a result other data pages will likely be removed from cache based on a Least Recently Used algorithm, that can incur a penalty because the pages being removed may have been updated (termed dirty pages) so need to be written to the permanent storage device.

Determining the Workload

Online Transaction Processing (OLTP) transactions are predominately seek operations where the seek reads or writes a small amount of data usually in the order of < 1Mbyte, that data would likely be dispersed across the data file so SQL Server will do a number of random 8Kbyte reads, an example would be retrieving the details of a specific order or inserting a new order. Online Analytical Processing (OLAP) transactions are predominately a mixture of seek and scan operations, the scan operations are typically used to access the large “fact” tables for example granularity at the orders level, the seek operations are used to access the smaller dimension or meta data tables for example customer detail, the access pattern would be a mix of 8Kbyte random reads as well as larger reads in the range of 64Kbyte and 1MByte by the read ahead manager.

Ordinarily OLTP systems require very low latency (sub 3 milliseconds) on the transaction log because the INSERT and UPDATE queries need to complete extremely quickly otherwise blocking can occur or the additional resource put on the IO subsystem will cause the SLA response time metric to be exceeded. Because the data files (MDF and NDF files) are written asynchronously by the checkpoint and lazywriter processes low latency on small IO’s are not as important as they are for the transaction log, the IO subsystem will usually be better at larger IO reads and writes, typically in the range 8Kbytes to 1Mbytes.

Method of Estimation

1.       Make a list of typical operations across the timeline of a given day, break the operations down into either ad-hoc or set time operations further breaking those down into the following classifications:

a.       Intensive write: the operation writes large volumes of data thus requiring a lot of write activity to the transaction log.

b.      Intensive read: the operation reads large volumes of data thus requiring a lot of read activity, that activity probably being a mixture of sequential scan and random reads.

c.       Intensive read/write: a mixture of (a) and (b)

2.       For each operation determine the IO and CPU characteristics, that can be done using an isolated system using SQL Profiler to capture CPU, Reads and Writes and Perfmon to capture the counters mentioned earlier in this post.

3.       Across the timeline for the day determine when each operation is likely to run, Ad-hoc queries are difficult to determine but perhaps users are only doing them between 9am and 9pm for example.

4.       Expand (3) and estimate the worse case number of concurrent operations from your typical workload.

5.       Aggregate the figures up to work out the required number of IOs’ps and the required read/write permanent storage latency.

A rough resource requirement for your workload can now be used to understand the type of hardware and how it needs to be configured.

If you would like to comment on the above or would like me to perform the above analysis for your own system then drop me a line to tonyrogerson@torver.net.

 

 

Neil Hambly and myself are presenting next week in Cambridge, Neil will be showing us how to use tools at hand to determine the current activity on your database servers and I'll be doing a talk around Disaster Recovery and High Availability and the options we have at hand.

The User Group is growing in size and spread, there is a Southampton event planned for the 9th Dec - make sure you keep your eyes peeled for more details - the best place is the UK SQL Server User Group LinkedIn area.

Want removing from this email list? Then just reply with remove please on the subject line.

Cambridge SQL UG - 25th Nov, Evening
Evening Meeting, More info and register

Neil Hambly on Determining the current activity of your Database Servers, Product demo from Red-Gate, Tony Rogerson on HA/DR/Scalability(Backup/Recovery options - clustering, mirroring, log shipping; scaling considerations etc.)

Leeds SQL UG - 8th Dec, Evening
Evening Meeting, More info and register

Neil Hambly will be talking about Index Views and Computed Columns for Performance, Tony Rogerson will be showing some advanced T-SQL techniques.

Manchester SQL UG - 9th Dec, Evening
Evening Meeting, More info and register

End of year wrap up, networking, drinks, some discussions - more info to follow soon.

Edinburgh SQL UG - 9th Dec, Evening
Evening Meeting, More info and register

Satya Jayanty will give an X factor for a DBAs life and Tony Rogerson will talk about SQL Server internals.

Many thanks,
Tony Rogerson, SQL Server MVP

UK SQL Server User Group
http://sqlserverfaq.com

What, a database professional of some 24 years organising a NoSQL event? Yes, well, I think its an important emerging technical area we need to know about. Databases can't do everything (at the moment). People who know me will affirm my religion regarding using commodity kit.

 So.... 

The first NoSQL (Not only SQL) developer day will take place at the University of Dundee on Saturday 20th Nov (http://developerdeveloperdeveloper.com/nosql1/). Come and join other developers and database professionals and learn more about this emerging technology area. Seven sessions across the day covering VoltDB, Hadoop and HBase, Windows Azure, Cassandra and Hector, CouchDB, RavenDB and a session on is NoSQL the future of data storage.

Sponsored by the University of Dundee and NCR.

The code for the demo's I talked through at the Edinburgh SQL Server User Group can be found here: http://sqlblogcasts.com/blogs/tonyrogerson/UKSSUG/UG20101007%20-%20Hierarchies.zip

I explain recursive CTE's. 

I show how to do an adjacency list (parent/child) in SQL using recursive CTE's.

Materialised Path in SQL.

HierarchyID

Point in time hierarchy.

 

Abstract:

The technique of Recency Frequency Intensity/Monetary is a powerful analytical technique for identifying data patterns and business performance. An introduction to the technique will be given, however the main focus of the session will be on demonstrating on how RFI/M can be performed using a number of SQL features such as Data Windowing, the OVER clause and PARTITION BY, CROSS APPLY and Common Table Expressions and how you can nest the table expressions. The session should be of benefit to both inexperienced and experienced SQL coders and analysts, each construct will be explained as well as the query plans produced. Demo's will be done on AdventureWorks which we actually discover is going out of business!

Feel free to email me with any quesitons, comments and suggestions - tonyrogerson@torver.net  

http://sqlblogcasts.com/blogs/tonyrogerson/UKSSUG/RF_SQLBits2010%20-%20ACC%20and%20TR3.zip

 

Now autumn is on us, well so it feels now that I leave the house in the dark, we are back on track and pushing forward the user group activities.

Check out the
Linked-In group we've set up, the idea is to foster discussions amongst UK SQL people, not just DBA's but anybody in the UK where they come into contact with SQL Server. There will be a prize of a free place to the SQLBits training day (thurs 30th Sept) and to the Friday conference day (1st Oct), the winner will be chosen from random from the LinkedIn group members at the Manchester user group on the 16th of this month.

Don't forget you can keep in touch with everybody using twitter tag #sqlfaq, for professional networking and a forum through the
linkedin group, through facebook for the social angle and of course through ug website for events and content.

Starting on the 23rd September we are starting Cuppa Corner Twitter sessions, the idea is this: between 11am and 2pm UK time we talk around a specific subject using twitter and the tag #sqlfaqcuppa, it will be a weekly connection, the person judged by us to have the best contribution will win a MSDN Ultimate subscription - more details on the site tomorrow.

Leeds and Manchester September meeting agenda is now in place, there is also an opportunity to win a MSDN Ultimate subscription which is worth £K's.

SQLBits (York 30th Sept) is very close now, registration is open (see
http://sqlbits.com) there are still places on the training days, check out the really excellant content and speaker line up - we have a discount code for UG members which is SQLFAQ20. Follow the progress on twitter with tag #sqlbits.

Leeds SQL UG - 15th Sep, Evening
Evening Meeting,
More info and register

XML Data in SQL Server - Anthony Brown
Managing SQL Server in a global economic meltdown - Iain Kick (Quest Software)


Manchester SQL UG - 16th Sep, Evening
Evening Meeting,
More info and register

Complex Event Processing (StreamInsight) - Allan Mitchell
Managing SQL Server in a global economic meltdown - Iain Kick (Quest Software)


Telford BCS (Shropshire) - 20th Sep, Evening
Evening Meeting,
More info and register

Introduction to Business Intelligence event: Data Warehousing, Cubes, Reporting Systems - Alex Whittles


Maidstone SQL UG - 22nd Sep, Evening
Evening Meeting,
More info and register

Analysis Services (Configuration, Best Practices) - Andrew Calvett
T-SQL Techniques - Dave Ballantyne


SQLBits Training Day - 30th Sept, York
Evening Meeting,
More info and register

Various full day seminars by leadings in the SQL Server space
See site for more info - for 20% discount use SQLFAQ20


SQLBits Conference Day - 1st Oct, York
Evening Meeting,
More info and register

Multiple tracks by leading figures in the SQL Server space
See site for more info - for 20% discount use SQLFAQ20


SQLBits Community Conference Day - 2nd Oct, York
Evening Meeting,
More info and register

Multiple tracks by leading figures in the SQL Server space


Many thanks,
Tony Rogerson, SQL Server MVP

UK SQL Server User Group
http://sqlserverfaq.com

Code below is from the Recency Frequency of {} (RFM and RFI) talk I did at the user group last night, like I said its the first part - it doesn't do the category flow rates between temporal windows.

I'm writing a full day talk on how to do this using Inmon (normalised route) and Kimball (denormalised star/snowflake and dimensional database routes) - if you have any questions, ideas or suggestions then please do email me tonyrogerson@torver.net - I'm very keen on feedback.

USE AdventureWorks2008R2

go


 

--

--  Self Join and the Window of rows

--


 

select winstart = case when soh2.SalesOrderID = soh4.SalesOrderID then '***' else '' end,

       control_input_SalesOrderID = soh2.SalesOrderID,

       control_input_OrderDate    = soh2.OrderDate,

       lookup_input_SalesOrderID  = soh4.SalesOrderID,

       lookup_input_OrderDate     = soh4.OrderDate,

       recency_days = DATEDIFF( day, soh4.OrderDate, soh2.OrderDate )

from Sales.SalesOrderHeader soh2

    left outer join Sales.SalesOrderHeader soh4

            on soh4.CustomerID = soh2.CustomerID

           AND soh4.OrderDate <= soh2.OrderDate

--           AND soh4.SalesOrderID <> soh2.SalesOrderID

                                               

where soh2.CustomerID = 11091

order by control_input_SalesOrderID,

         lookup_input_SalesOrderID desc


 


 


 


 


 


 

--

--  What about efficiency? The above a good way to do this?

--


 

select soh2.CustomerID,

       curr_order   = soh2.OrderDate,

       prev_order   = soh3.OrderDate,

       recency_days = DATEDIFF( day, soh3.OrderDate, soh2.OrderDate )


 

from Sales.SalesOrderHeader soh2

    left outer join Sales.SalesOrderHeader soh3

            on soh3.SalesOrderID = (    SELECT MAX( soh4.SalesOrderID )

                                        FROM Sales.SalesOrderHeader soh4

                                        WHERE soh4.CustomerID = soh2.CustomerID

                                          AND soh4.OrderDate <= soh2.OrderDate

                                          AND soh4.SalesOrderID <> soh2.SalesOrderID )

order by soh2.CustomerID, recency_days

go                                              


 

--  Rob Farley....

select soh.CustomerID,

       curr_order   = soh.OrderDate,

       prev_order   = prev.PrevOrderDate,

       recency_days = DATEDIFF(day, prev.PrevOrderDate, soh.OrderDate )

from Sales.SalesOrderHeader as soh

    cross apply (   select MAX(prev.OrderDate) as PrevOrderDate

                    from Sales.SalesOrderHeader as prev

                    where prev.CustomerID = soh.CustomerID

                      and prev.SalesOrderID <> soh.SalesOrderID

                      and prev.OrderDate <= soh.OrderDate) prev

order by soh.CustomerID, recency_days

go


 

--  Itzik Ben Gan....

with soh_rn as

(

  select CustomerID,

         OrderDate,

         n = row_number() over(order by CustomerID, OrderDate, SalesOrderID)

  from Sales.SalesOrderHeader


 

)

select s1.CustomerID,

       curr_order   = s1.OrderDate,

       prev_order   = s2.OrderDate,

       recency_days = DATEDIFF( day, s2.OrderDate, s1.OrderDate )

  from soh_rn as s1

    left outer join soh_rn as s2

      on s1.CustomerID = s2.CustomerID

      and s1.n = s2.n + 1

order by s1.CustomerID, recency_days


 


 


 


 


 


 


 

--

--  OVER clause

--


 

select CustomerID,

       SalesOrderID,

       SubTotal,

       SUM( SubTotal ) OVER( PARTITION BY CustomerID ) AS GrandTotal,

       ( SubTotal /

            SUM( SubTotal ) OVER( PARTITION BY CustomerID )) * 100 AS pct_GrandTotal

      

from Sales.SalesOrderHeader

where CustomerID in ( 11091

                    , 11176

                    , 11185

                    , 11200

                    , 11223 )


 


 

select CustomerID,

       SalesOrderID,

       SubTotal,

       LEFT( CONVERT( Char(8), OrderDate, 112 ), 6 ) AS YYYYMM,

       SUM( SubTotal ) OVER( PARTITION BY CustomerID, LEFT( CONVERT( Char(8), OrderDate, 112 ), 6 ) ) AS YYYYMMGrandTotal,

       SUM( SubTotal ) OVER( PARTITION BY CustomerID ) AS GrandTotal,

       ( SubTotal / SUM( SubTotal ) OVER( PARTITION BY CustomerID )) * 100

            AS pct_GrandTotal,

       ( SubTotal / SUM( SubTotal ) OVER( PARTITION BY CustomerID, LEFT( CONVERT( Char(8), OrderDate, 112 ), 6 ) )) * 100

            AS pct_YYYYMMGrandTotal

      

from Sales.SalesOrderHeader

where CustomerID in (    11091

                       , 11176

                       , 11185

                       , 11200

                       , 11223 )


 


 


 


 

--

--  CROSS APPLY

--


 

select soh.CustomerID,

       curr_order   = soh.OrderDate,

       prev_order   = prev.PrevOrderDate,

       recency_days = DATEDIFF(day, prev.PrevOrderDate, soh.OrderDate )

from Sales.SalesOrderHeader as soh

    cross apply (   select MAX(prev.OrderDate) as PrevOrderDate

                    from Sales.SalesOrderHeader as prev

                    where prev.CustomerID = soh.CustomerID

                      and prev.SalesOrderID <> soh.SalesOrderID

                      and prev.OrderDate <= soh.OrderDate) prev

order by soh.CustomerID, recency_days

go


 


 


 


 

--

--  Common Table Expression

--


 

; with recency ( curr_order, prev_order, recency_days )

as (


 

    select curr_order   = soh2.OrderDate,

           prev_order   = soh3.OrderDate,

           recency_days = DATEDIFF( day, soh3.OrderDate, soh2.OrderDate )

    from Sales.SalesOrderHeader soh2

        left outer join Sales.SalesOrderHeader soh3

                on soh3.SalesOrderID = (    SELECT MAX( soh4.SalesOrderID )

                                            FROM Sales.SalesOrderHeader soh4

                                            WHERE soh4.CustomerID = soh2.CustomerID

                                              AND soh4.OrderDate <= soh2.OrderDate

                                              AND soh4.SalesOrderID <> soh2.SalesOrderID )

                                                  

    where soh2.CustomerID = 11091

    )

select  curr_order,

        prev_order,

        recency_days,

        average_recency = ( select AVG( recency_days )

                            from recency ),

        stddev_recency  = ( select STDEVP( recency_days )

                            from recency ),

        dist_within_1std= ( select COUNT(*)

                            from recency r

                                cross join ( select stddev_recency = STDEVP( recency_days )

                                             from recency ) as st

                            where r.recency_days between -stddev_recency and stddev_recency ),

        dist_within_2std= ( select COUNT(*)

                            from recency r

                                cross join ( select stddev_recency = STDEVP( recency_days )

                                             from recency ) as st

                            where r.recency_days between -stddev_recency * 2 and stddev_recency * 2 ),

        dist_over_2std=   ( select COUNT(*)

                            from recency r

                                cross join ( select stddev_recency = STDEVP( recency_days )

                                             from recency ) as st

                            where r.recency_days < -stddev_recency * 2

                               or r.recency_days > stddev_recency * 2 ),

        population_Size=  ( select COUNT(*)

                            from recency )

from recency

order by curr_order


 


 


 


 


 


 


 


 

--

--  Pulling the data for the recency frequency analysis

--


 

; with soh_rn

          as   (

       select CustomerID,

               OrderDate,

               SubTotal,

               rn = row_number() over(order by CustomerID, OrderDate, SalesOrderID)

       from Sales.SalesOrderHeader


 

     )

 , recency ( CustomerID, curr_order, prev_order, recency_days, SubTotal, rn )

          as (


 

     select s1.CustomerID,

             curr_order   = s1.OrderDate,

             prev_order   = s2.OrderDate,

             recency_days = DATEDIFF( day, s2.OrderDate, s1.OrderDate ),

             s1.subTotal,

             s1.rn

       from soh_rn as s1

          left outer join soh_rn as s2

            on s1.CustomerID = s2.CustomerID

            and s1.rn = s2.rn + 1 )


 

select  c1.*,

          median = AVG( cast( case when population_size % 2 = 0 then

                        case when position_in_dataset between population_size / 2     --   even number - average the middle two

                                                      and ( population_size / 2 ) + 1 then c1.recency_days else null end

                                 else case when position_in_dataset = ( population_size / 2 ) + 1 then c1.recency_days else null end       --     odd number - take the middle

                            end as decimal( 28, 5 ) ) )

                           

                                      OVER( PARTITION BY c1.CustomerID )

into #t

from (   

    select  r.CustomerID,

            curr_order,

            prev_order,

            recency_days,

            population_size = COUNT(recency_days) OVER( PARTITION BY r.CustomerID ),

            position_in_dataset = ROW_NUMBER() OVER( PARTITION BY r.CustomerID ORDER BY recency_days),

            average_recency = AVG( cast( recency_days as decimal( 28, 5 ) ) ) OVER( PARTITION BY r.CustomerID ),

            stddev_recency  = STDEV( recency_days ) OVER( PARTITION BY r.CustomerID ),

            SubTotal,

            rn

    from recency r

    ) as c1


 


 

--

--   Frequency worked out...

--


 

select *

from #t

where CustomerID = 11176

order by position_in_dataset

order by CustomerID

       , curr_order desc

      

select customerid from #t group by CustomerID order by COUNT(*) desc      


 

--

--   Work out recency by frequency using the most recent order...

--   to current date, we are using the current date of 20080803 because

--   the data is static

--


 

select recency_days = DATEDIFF( day, curr_order, '20080803' ),

        frequency    = COALESCE( median, 1 ),

        intensity    = COUNT( * ),

        monetary     = SUM( SubTotal )

into #t2

from #t t

where rn = (  SELECT MAX( t2.rn )

                  FROM #t t2

                  WHERE t2.CustomerID = t.CustomerID

                  )

group by DATEDIFF( day, curr_order, '20080803' ),

           COALESCE( median, 1 )

order by recency_days,

           frequency


 

--

--   Group by category

--


 

declare @recency_categories table (

     cat_name varchar(50) not null primary key clustered,

     recency_from  smallint not null,

     recency_to         smallint not null,

          check( recency_from <= recency_to )

     )

declare @frequency_categories table (

     cat_name varchar(50) not null primary key clustered,

     frequency_from smallint not null,

     frequency_to  smallint not null,

          check( frequency_from <= frequency_to )

     )

insert @recency_categories ( cat_name, recency_from, recency_to )

     values ( 'past 15 days', 0, 15 ),

             ( '16 - 30 days', 16, 30 ),

            ( '31 - 45 days', 31, 45 ),

            ( '46 - 60 days', 46, 60 ),

            ( '61 - 75 days', 61, 75 ),

            ( '76 - 90 days', 76, 90 ),

            ( 'over 90 days', 91, 32767 )

insert @frequency_categories ( cat_name, frequency_from, frequency_to )

     values ( 'single', 1, 1 ),

             ( '2 - 5', 2, 5 ),

            ( '6 - 10', 6, 10 ),

            ( 'over 10', 11, 32767 )


 

select recency_cat   = rc.cat_name,

        recency_ord   = rc.recency_from,

        frequency_cat = fq.cat_name,

        frequency_ord = fq.frequency_from,

        intensity = SUM( t.intensity ),

        monetary  = SUM( t.monetary )

from #t2 t

     inner join @recency_categories rc on t.recency_days between rc.recency_from

                                                                        and rc.recency_to

     inner join @frequency_categories fq on t.frequency between fq.frequency_from

                                                                      and fq.frequency_to

group by rc.cat_name,

           rc.recency_from,

           fq.cat_name,

           fq.frequency_from

order by recency_ord,

          frequency_ord


 

--

-- Review recency

--


 

select recency_cat   = rc.cat_name,

        recency_ord   = rc.recency_from,

        intensity = SUM( t.intensity ),

        monetary  = SUM( t.monetary )

from #t2 t

     inner join @recency_categories rc on t.recency_days between rc.recency_from

                                                                        and rc.recency_to

     inner join @frequency_categories fq on t.frequency between fq.frequency_from

                                                                      and fq.frequency_to

group by rc.cat_name,

           rc.recency_from

order by recency_ord


 


 


 

--

--   Check

--

select COUNT( DISTINCT CustomerID )

from #t

where curr_order >= dateadd( day, -15, '20080803' )


 

select SUM( intensity )

from #t2

where recency_days <= 15

         

select COUNT( distinct customerid )

from Sales.SalesOrderHeader

where OrderDate >= dateadd( day, -15, '20080803' )


 

select *

from #t2

where recency_days <= 15


 


 

 

We have acquired two additional tickets to attend the SQL Server Master Class with Paul Randal and Kimberly Tripp next Thurs (17th June), for a chance to win these coveted tickets email us (sqlfaq@torver.net) before 9pm this Sunday with the subject "MasterClass" - people previously entered need not worry - your still in with a chance. The winners will be announced Monday morning.

As ever plenty going on physically, we've got dates for a stack of events in Manchester and Leeds, I'm looking at Birmingham if anybody has ideas? We are growing our online community with the Cuppa Corner section, to participate online remember to use the #sqlfaq twitter tag; for those wanting to get more involved in presenting and fancy trying it out we are always after people to do 1 - 5 minute SQL nuggets or Cuppa Corners (short presentations) at any of these User Group events - just email us
sqlfaq@sqlserverfaq.com.

Want removing from this email list? Then just reply with remove please on the subject line.

Kimberly Tripp and Paul Randal Master Class - Thurs, 17th June - London

REGISTER NOW AND GET A SECOND REGISTRATION FREE*

The top things YOU need to know about managing SQL Server - in one place, on one day - presented by two of the best SQL Server industry trainers!

This one-day MasterClass will focus on many of the top issues companies face when implementing and maintaining a SQL Server-based solution. In the case where a company has no dedicated DBA, IT managers sometimes struggle to keep the data tier performing well and the data available. This can be especially troublesome when the development team is unfamiliar with the affect application design choices have on database performance.

The Microsoft SQL Server MasterClass 2010 is presented by Paul S. Randal and Kimberly L. Tripp, two of the most experienced and respected people in the SQL Server world. Together they have over 30 years combined experience working with SQL Server in the field, and on the SQL Server product team itself. This is a unique opportunity to hear them present at a UK event which will:

>> Debunk many of the ingrained misconceptions around SQL Server's behaviour
>> Show you disaster recovery techniques critical to preserving your company's life-blood - the data
>> Explain how a common application design pattern can wreak havoc in the database
>> Walk through the top-10 points to follow around operations and maintenance for a well-performing and available data tier!

Where: Radisson Edwardian Heathrow Hotel, London
When: Thursday 17th June 2010
*REGISTER TODAY AT
www.regonline.co.uk/kimtrippsql on the registration form simply quote discount code: BOGOF for both yourself and your colleague and you will save 50% off each registration – that’s a 249 GBP saving! This offer is limited, book early to avoid disappointment.


Wed, 23 Jun
READING
Evening Meeting,
More info and register

Introduction to NoSQL (Not Only SQL) - Gavin Payne;
T-SQL Gotcha's and how to avoid them - Ashwani Roy;
Introduction to Recency Frequency - Tony Rogerson;
Reporting Services - Tim Leung


Thu, 24 Jun
CARDIFF
Evening Meeting,
More info and register

Alex Whittles of Purple Frog Systems talks about Data warehouse design case studies, Other BI related session TBC


Mon, 28 Jun
EDINBURGH
Evening Meeting,
More info and register

Replication (Components, Adminstration, Performance and Troubleshooting) - Neil Hambly
Server Upgrades (Notes and Best practice from the field) - Satya Jayanty


Wed, 14 Jul
LONDON
Evening Meeting,
More info and register

Meeting is being sponsored by DBSophic (http://www.dbsophic.com/download), database optimisation software. Physical Join Operators in SQL Server - Ami Levin
Workload Tuning - Ami Levin
SQL Server and Disk IO (File Groups/Files, SSD's, Fusion-IO, In-RAM DB's, Fragmentation) - Tony Rogerson
Complex Event Processing - Allan Mitchell


Many thanks,
Tony Rogerson, SQL Server MVP

UK SQL Server User Group
http://sqlserverfaq.com"

Allan Mitchell and myself are doing a double act, Allan is becoming one of the leading guys in the UK on StreamInsight and will give an introduction to this new exciting technology; on top of that I'll being talking about SQL Server Disk IO - well, "Disk" might not be relevant anymore because I'll be talking about SSD and IOFusion - basically I'll be talking about the underpinnings - making sure you understand and get it right, how to monitor etc... If you've any specific problems or questions just ping me an email tonyrogerson@sqlserverfaq.com.

To register for the event see: http://sqlserverfaq.com/events/217/SQL-Server-and-Disk-IO-File-GroupsFiles-SSDs-FusionIO-InRAM-DBs-Fragmentation-Tony-Rogerson-Complex-Event-Processing-Allan-Mitchell.aspx

18:15 SQL Server and Disk IO
Tony Rogerson, SQL Server MVP
Tony's Blog; Tony on Twitter


In this session Tony will talk about RAID levels, how SQL server writes to and reads from disk, the effect SSD has and will talk about other options for throughput enhancement like Fusion IO. He will look at the effect fragmentation has and how to minimise the impact, he will look at the File structure of a database and talk about what benefits multiple files and file groups bring. We will also touch on Database Mirroring and the effect that has on throughput, how to get a feeling for the throughput you should expect.

19:15 Break

19:45 Complex Event Processing (CEP)
Allan Mitchell, SQL Server MVP
http://sqlis.com/sqlis

StreamInsight is Microsoft’s first foray into the world of Complex Event Processing (CEP) and Event Stream Processing (ESP).  In this session I want to show an introduction to this technology.  I will show how and why it is useful.  I will get us used to some new terminology but best of all I will show just how easy it is to start building your first CEP/ESP application.

In this 10 minute screencast I go through the basics of what I term windowing, which is basically the technique of filtering to a set of rows given a specific value, for instance a Sub-Query that aggregates or a join that returns more than just one row (for instance on a one to one relationship).

http://sqlserverfaq.com/content/SQL-Basic-Windowing-using-Joins.aspx

SQL below...

USE tempdb

go

   

CREATE TABLE RollingTotals_Nesting (

 

    client_id          int not null,

    transaction_date   date not null,

    transaction_amount decimal( 28, 2 ) not null,

   

        constraint pk_RollingTotal_Nesting

            primary key clustered( client_id, transaction_date )

);

 

INSERT INTO RollingTotals_Nesting ( client_id, transaction_date, transaction_amount )

    VALUES ( 1, '20100410', 10 ),

           ( 1, '20100411', 10 ),

           ( 1, '20100412', 10 ),

           ( 1, '20100413', 10 ),

           ( 1, '20100414', 10 );

          

          

 

--

--  Single level of nesting (the sub-query)

--

BEGIN

 

    SELECT client_id, transaction_date, transaction_amount,

           (    SELECT SUM( rt2.transaction_amount )

                FROM RollingTotals_Nesting AS rt2

                WHERE rt2.client_id = rt1.client_id

                  AND rt2.transaction_date <= rt1.transaction_date ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt1;

 

END

   

--

--  What we really doing?

--

BEGIN

 

    --  For each row in RollingTotals_Nesting do this...

 

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100410';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100411';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100412';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100413';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100414';

     

END

GO

   

--

--  Order of processing the rows in RollingTotals_Nesting is imaterial.

--  Our restriction is on the columns {client_id, transaction_date}

--

 

BEGIN

 

    DROP TABLE RollingTotals_Nesting

 

    CREATE TABLE RollingTotals_Nesting (

 

        client_id   int not null,

        transaction_date date not null,

        transaction_amount decimal( 28, 2 ) not null,

       

            constraint pk_RollingTotal_Nesting

                primary key NONclustered( client_id, transaction_date )

    );

 

    INSERT INTO RollingTotals_Nesting ( client_id, transaction_date, transaction_amount )

        VALUES ( 1, '20100412', 10 ),

               ( 1, '20100410', 10 ),

               ( 1, '20100413', 10 ),

               ( 1, '20100411', 10 ),

               ( 1, '20100414', 10 );

              

    SELECT client_id, transaction_date, transaction_amount

    FROM RollingTotals_Nesting;

 

    SELECT client_id, transaction_date, transaction_amount,

           (    SELECT SUM( rt2.transaction_amount )

                FROM RollingTotals_Nesting AS rt2

                WHERE rt2.client_id = rt1.client_id

                  AND rt2.transaction_date <= rt1.transaction_date ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt1;

 

END

  

---

--- Using a "Window" over your data

---

 

BEGIN

 

    --  Source data

 

    select top 100 rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    order by rownumber

  

    --  Refine #1 - get closer to what we want

 

    select top 10000 rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where F3 is not null

    order by rownumber

   

    --

    --      Find windows for maker/model so we can fill in the NULL's

    --

 

    select i1.F1 as maker,

           i1.F2 as model,         --  Model names sometimes wrap onto the next "line"

           MIN( i1.rownumber ) as range_start,

           MIN( i2.rownumber ) as range_end

    into #MakerModelWindowRanges

    from dbo.imp_MOTXLS as i1

          inner join dbo.imp_MOTXLS as i2 on i2.rownumber > i1.rownumber

    where i1.F3 is not null     -- data quality restriction

      and i2.F3 is not null     -- data quality restriction

      and i2.F1 is not null     -- data quality restriction

      and i2.F2 is not null     -- data quality restriction

      and i2.F1 <> i1.F1    --  Make change

      and i2.F2 <> i2.F1    --  Model change

    group by i1.F1,

             i1.F2

    order by maker, model

  

    --  Add the key...

 

    alter table #MakerModelWindowRanges alter column maker varchar(100) not null

    alter table #MakerModelWindowRanges alter column model varchar(100) not null

 

    alter table #MakerModelWindowRanges add primary key clustered( maker, model )

  

    select maker, model, range_start, range_end

    from #MakerModelWindowRanges

    order by maker, model

   

    --  Check a window

 

    select rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where rownumber between 96 and 120

      and F3 is not null

   

     

    --  Remember we are after when F1 changes OR F2 changes

    --  F2 changes on rownumber 121

    --  hence our right most range in the window is 120

    --      and i2.F1 <> i1.F1

    --      and i2.F2 <> i2.F1

 

    select rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where rownumber between 96 - 2 and 120 + 2

      and F3 is not null

   

    --  Now we have our Lookup Windowing table we can work the actual gubbings out

 

    select wrng.maker,

           wrng.model,

           m.F3 as YearOfFirstUse,

           m.F4 as TestPasses,

           m.F5 as TestFails

    from #MakerModelWindowRanges AS wrng

        inner join imp_MOTXLS AS m ON m.rownumber BETWEEN wrng.range_start AND wrng.range_end

    where m.F3 not in ( 'First Use', 'Total', 'Year of' )

    order by wrng.maker, wrng.model, YearOfFirstUse

 

       

       

    DROP TABLE #MakerModelWindowRanges

    DROP TABLE RollingTotals_Nesting

  

END

 

More Posts « Previous page - Next page »