May 2010 - Posts

In this 10 minute screencast I go through the basics of what I term windowing, which is basically the technique of filtering to a set of rows given a specific value, for instance a Sub-Query that aggregates or a join that returns more than just one row (for instance on a one to one relationship).

http://sqlserverfaq.com/content/SQL-Basic-Windowing-using-Joins.aspx

SQL below...

USE tempdb

go

   

CREATE TABLE RollingTotals_Nesting (

 

    client_id          int not null,

    transaction_date   date not null,

    transaction_amount decimal( 28, 2 ) not null,

   

        constraint pk_RollingTotal_Nesting

            primary key clustered( client_id, transaction_date )

);

 

INSERT INTO RollingTotals_Nesting ( client_id, transaction_date, transaction_amount )

    VALUES ( 1, '20100410', 10 ),

           ( 1, '20100411', 10 ),

           ( 1, '20100412', 10 ),

           ( 1, '20100413', 10 ),

           ( 1, '20100414', 10 );

          

          

 

--

--  Single level of nesting (the sub-query)

--

BEGIN

 

    SELECT client_id, transaction_date, transaction_amount,

           (    SELECT SUM( rt2.transaction_amount )

                FROM RollingTotals_Nesting AS rt2

                WHERE rt2.client_id = rt1.client_id

                  AND rt2.transaction_date <= rt1.transaction_date ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt1;

 

END

   

--

--  What we really doing?

--

BEGIN

 

    --  For each row in RollingTotals_Nesting do this...

 

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100410';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100411';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100412';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100413';

    SELECT SUM( rt2.transaction_amount ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt2

    WHERE rt2.client_id = 1

      AND rt2.transaction_date <= '20100414';

     

END

GO

   

--

--  Order of processing the rows in RollingTotals_Nesting is imaterial.

--  Our restriction is on the columns {client_id, transaction_date}

--

 

BEGIN

 

    DROP TABLE RollingTotals_Nesting

 

    CREATE TABLE RollingTotals_Nesting (

 

        client_id   int not null,

        transaction_date date not null,

        transaction_amount decimal( 28, 2 ) not null,

       

            constraint pk_RollingTotal_Nesting

                primary key NONclustered( client_id, transaction_date )

    );

 

    INSERT INTO RollingTotals_Nesting ( client_id, transaction_date, transaction_amount )

        VALUES ( 1, '20100412', 10 ),

               ( 1, '20100410', 10 ),

               ( 1, '20100413', 10 ),

               ( 1, '20100411', 10 ),

               ( 1, '20100414', 10 );

              

    SELECT client_id, transaction_date, transaction_amount

    FROM RollingTotals_Nesting;

 

    SELECT client_id, transaction_date, transaction_amount,

           (    SELECT SUM( rt2.transaction_amount )

                FROM RollingTotals_Nesting AS rt2

                WHERE rt2.client_id = rt1.client_id

                  AND rt2.transaction_date <= rt1.transaction_date ) AS transaction_amount_rolling_total

    FROM RollingTotals_Nesting AS rt1;

 

END

  

---

--- Using a "Window" over your data

---

 

BEGIN

 

    --  Source data

 

    select top 100 rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    order by rownumber

  

    --  Refine #1 - get closer to what we want

 

    select top 10000 rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where F3 is not null

    order by rownumber

   

    --

    --      Find windows for maker/model so we can fill in the NULL's

    --

 

    select i1.F1 as maker,

           i1.F2 as model,         --  Model names sometimes wrap onto the next "line"

           MIN( i1.rownumber ) as range_start,

           MIN( i2.rownumber ) as range_end

    into #MakerModelWindowRanges

    from dbo.imp_MOTXLS as i1

          inner join dbo.imp_MOTXLS as i2 on i2.rownumber > i1.rownumber

    where i1.F3 is not null     -- data quality restriction

      and i2.F3 is not null     -- data quality restriction

      and i2.F1 is not null     -- data quality restriction

      and i2.F2 is not null     -- data quality restriction

      and i2.F1 <> i1.F1    --  Make change

      and i2.F2 <> i2.F1    --  Model change

    group by i1.F1,

             i1.F2

    order by maker, model

  

    --  Add the key...

 

    alter table #MakerModelWindowRanges alter column maker varchar(100) not null

    alter table #MakerModelWindowRanges alter column model varchar(100) not null

 

    alter table #MakerModelWindowRanges add primary key clustered( maker, model )

  

    select maker, model, range_start, range_end

    from #MakerModelWindowRanges

    order by maker, model

   

    --  Check a window

 

    select rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where rownumber between 96 and 120

      and F3 is not null

   

     

    --  Remember we are after when F1 changes OR F2 changes

    --  F2 changes on rownumber 121

    --  hence our right most range in the window is 120

    --      and i2.F1 <> i1.F1

    --      and i2.F2 <> i2.F1

 

    select rownumber, F1, F2, F3, F4, F5

    from imp_MOTXLS

    where rownumber between 96 - 2 and 120 + 2

      and F3 is not null

   

    --  Now we have our Lookup Windowing table we can work the actual gubbings out

 

    select wrng.maker,

           wrng.model,

           m.F3 as YearOfFirstUse,

           m.F4 as TestPasses,

           m.F5 as TestFails

    from #MakerModelWindowRanges AS wrng

        inner join imp_MOTXLS AS m ON m.rownumber BETWEEN wrng.range_start AND wrng.range_end

    where m.F3 not in ( 'First Use', 'Total', 'Year of' )

    order by wrng.maker, wrng.model, YearOfFirstUse

 

       

       

    DROP TABLE #MakerModelWindowRanges

    DROP TABLE RollingTotals_Nesting

  

END

 

Got two user group meetings in London for you, we've also started the Cuppa Corner sessions - the first 3 are up on the site - A trip to First Normal Form, Lookup and Cache Transform in SSIS and Pipeline Limiter in SSIS - we are aiming for at least one per week. WhereScape are doing a breakfast meeting on Agile techniques to Data Warehousing and Kimberly Tripp and Paul Randal are over in June for a 1 day master class. Finally a 3 day performance and monitoring workshop on 22- 24th June in London by Ramesh Meyyappan

As ever, if you don't want these emails just reply to me with remove in the subject.

This Wed evening, 19th May in London

Service Broker - Neil Hambly; Neil will give and introduction, he'll talk about terminology and design considerations.

BakBone NetVault: FASTRecover provides byte-level continuous data protection for Exchange, SQL and Windows File Server. That’s smart, but the recovery mechanism is even smarter. In around 30 seconds after starting recovery you can begin accessing your data again.

Monitoring and Controlling Resources in SQL Server - Chris Testa-O'Neil; Chris will talk about Resource Governor as well as the Management Data Warehouse (Data Collector)

More info and registration

This Thurs evening, 20th May in London

Business Intelligence: Implementing common business calcs using DAX in PowerPivot - Chris Webb; Using PowerPivot you'll need to write calculations in DAX - this session covers basic DAX concepts.

Case Study on how to deal with Metrics on high data volumes - Andrew Sadler.

More info and registration

Wed morning, 26th May in London

Agile Data Warehousing interactive breakfast workshop:

-> Apply Agile techniques to the Data Warehouse development process
-> Get business involved in the new approach
-> Solve the business requirements gathering conundrum
-> Break the ETL fixation
More info and registration

Thur full day - 17th June in London

SQL Server Master Class with Kimberly Tripp and Paul Randal

This one-day MasterClass will focus on many of the top issues companies face when implementing and maintaining a SQL Server-based solution. In the case where a company has no dedicated DBA, IT managers sometimes struggle to keep the data tier performing well and the data available. This can be especially troublesome when the development team is unfamiliar with the affect application design choices have on database performance.

The Microsoft SQL Server MasterClass 2010 is presented by Paul S. Randal and Kimberly L. Tripp, two of the most experienced and respected people in the SQL Server world. Together they have over 30 years combined experience working with SQL Server in the field, and on the SQL Server product team itself. This is a unique opportunity to hear them present at a UK event which will:

Note: there is a free place give away see http://sqlserverfaq.com for more information

-> Debunk many of the ingrained misconceptions around SQL Server's behaviour

-> Show you disaster recovery techniques critical to preserving your company's life-blood - the data

-> Explain how a common application design pattern can wreak havoc in the database

-> Walk through the top-10 points to follow around operations and maintenance for a well-performing and available data tier!

More info and registration

Jun 22 - 24th, 3 Day Level 400 Microsoft SQL Server 2008 and SQL Server 2005 Performance Monitoring & Tuning Workshop London

-> Analyzing current performance bottlenecks of a SQL Server in a Production environment

-> Monitoring and Tuning CPU Utilization

-> Monitoring and Tuning Disk I/O and Database Files

-> Monitoring and Tuning Memory Utilization

-> Monitoring and Tuning Query Optimization and Query Execution

-> Monitoring and Tuning Locks, Deadlocks and other Concurrency issues

-> Monitoring and Tuning Database Mirroring, Backup Compression and Database Encryption

More info and registration

It's 15 minutes, I talk about Domains, Functional Dependencies, Repeating Groups, Relational Valued Attributes and of course First Normal Form.

http://www.screencast.com/t/ZjA3OTg4

For questions just ask on the http://sqlserverfaq.com chat control or Twitter using #sqlfaq tag.

Slides are also availble here: http://sqlcontent.sqlblogcasts.com/video/cctr20100507dbdesign1nf/cc_tr20100507_dbdesign1nf.pptx