in

SQL Server Blogs

Voices from the UK SQL Server Community

Atul Thakor

  • Finding the maximum value/date across columns

    While working on some code recently I discovered a neat little trick to find the maximum value across several columns…..

    So the starting point was finding the maximum date across several related tables and storing the maximum value against an aggregated record.

    Here's the sample setup code:

    USE TEMPDB
    IF OBJECT_ID('CUSTOMER') IS NOT NULL
    BEGIN
        DROP TABLE CUSTOMER
    END
    
    IF OBJECT_ID('ADDRESS') IS NOT NULL
    BEGIN
        DROP TABLE ADDRESS
    END
    
    
    IF OBJECT_ID('ORDERS') IS NOT NULL
    BEGIN
        DROP TABLE ORDERS
    END
    
    SELECT
        1 AS CUSTOMERID,
        'FREDDY KRUEGER' AS NAME,
        GETDATE() - 10 AS DATEUPDATED
    INTO
        CUSTOMER
    
    SELECT 
        100000 AS ADDRESSID,
        1 AS CUSTOMERID,
        '1428 ELM STREET' AS ADDRESS,
        GETDATE() -5 AS DATEUPDATED
    INTO
        ADDRESS
        
    SELECT
        123456 AS ORDERID,
        1 AS CUSTOMERID,
        GETDATE() + 1 AS DATEUPDATED
    INTO ORDERS
    .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

     

    Now the code used a function to determine the maximum date, this performed poorly. After considering pivoting the data I opted for a case statement, this seemed reasonable until I discovered other areas which needed to determine the maximum date between 5 or more tables which didn't scale well. The final solution involved using the value clause within a sub query as followed.

    SELECT
        C.CUSTOMERID,
        A.ADDRESSID,
        (SELECT
            MAX(DT)
        FROM
            (Values(C.DATEUPDATED),(A.DATEUPDATED),(O.DATEUPDATED)) AS VALUE(DT))
    FROM
        CUSTOMER C
    INNER JOIN
        ADDRESS A ON C.CUSTOMERID = A.CUSTOMERID
    INNER JOIN
        ORDERS O ON O.CUSTOMERID = C.CUSTOMERID
    .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; }

    As you can see the solution scales well and can take advantage of many of the aggregate functions!

  • Introducing the Excel Fuzzy Matching Lookup Add-In

    My job involves a fair amount of fuzzy matching, when working on the analysis for a project I’ll often use excel as it does everything I need but its also great as you can send spreadsheets to the business knowing they’ll be comfortable using it and fairly self sufficient.

    This fuzzy matching Add-In from Microsoft Research really caught my eye, as it meant I could take sample records e.g.: Name1 and Name2 and run a rough estimation for how accurate the matching should be.

    The setup

    Firstly you need to have Excel 2010 installed, the add-in can be downloaded from here, unzip and click the usual next buttons.

    Here is a simple example to get you started:

    1. Create a new spreadsheet and paste the following in A1
      New Customer Existing Customer
      Jimmy Paige Jimmy Page
      Ladee Gaga Lady Gaga
      Jimmi Hendriks Jimmy Hendricks
    2. Fuzzy lookup needs tables to compare values so…
    3. Highlight all the rows(not entire column) in column A and hit Control  + L this creates a table within excel for that column, ensure you tick the “My table has headers”
    4. Highlight the rows(not entire column) in column B and hit Control + L and tick the headers column, you now have the 2 tables which you are ready to compare.
    5. Your screen should look like this
      image
    6. Now click the fuzzy lookup link in the toolbar and then click the fuzzy logic button, a menu should appear on the right
      image
    7. At the top the 2 tables which are to be compared are chosen, left column and right column are pre-selected, in the match columns the columns to be compared have already been selected, if not click the compare button which will add them to the match columns list.
    8. You can then select the type of comparison, the default is text, other options are Zip code, Phone Number or Social Security, you can define your own by clicking configure (I’ve yet to do this)
    9. Hit ‘Go’ and you should see the following:
      image
    10. As you can see we have 2 matches with similarity ratings,

    Issues

    I’ve found the add-in to be pretty unstable ranging from running forever to not running at all, to run a new comparison I usually restart excel!

    Don't let this put you off!

  • Reporting Services 2008 R2 Gauges setting scale and colour based on value

    So…. I started to play with reporting services this week and its proving to be pretty good fun so far… here is a very quick post covering the setup of a gauge, altering its scale and changing its colour based on input values.

    1. Firstly create a blank reporting services project
    2. Add a new Shared Data Source (for my example I’m using AdventureWorksLT2008) call this GaugeSource
    3. Add a new Shared Dataset using GaugeSource as your datasource, I’m using the following query which returns sales based on the ProductID supplied, name this GaugeData. You don't need to worry about populating the @ProductID variable yet, when you run the report you’ll be prompted for a ProductID.
    4. select
      ProductID,
      LineTotal
      from
      SalesLT.SalesOrderDetail
      where
      ProductID = @ProductID

    5. Add a new report (without the wizard)
    6. Using the toolbox add a Gauge to the report and choose the default dial
    7. When asked which dataset choose GaugeData
    8. To change the scale right click on the Gauge->Gauge Panel->Scale Properties
      image
    9. I set the maximum to 40,000 because I know that the maximum value is 37,000, but change this as per your data
    10. Now you need to specify the value you wish to indicate on the dial, on my query I care about the LineTotal, to do this click on the dial (you may need to click twice, once to focus on the report and then once again to click on the dial), You’ll notice that the sum function is automatically applied to the LineTotal, this is the normal behaviour when using numerical values in SSRS
      image
    11. If you click preview you should see the following, you need to enter a value before the report can run , in my example I enter 969 and hit view reportimage
    12. Depending on your scale/data you should see something similar to this:
      image
    13. Now lets set the back colour based on the value shown
    14. Right click on the dial and click “Gauge Properties”
      image
    15. On the following screen click Back Fill and then flick the Fx button:
      image
    16. Paste the following into the expression window:
      =IIF(sum(Fields!LineTotal.Value) >= 30000, "Green", 
      IIF(sum(Fields!LineTotal.Value) <= 500, "Red", "WhiteSmoke"))

      The query states that if the sum of LineTotal >= 30000 set the colour to Green, if the sum is <= 500 set the colour to Red, else WhiteSmoke.
    17. You can test various values and colours

    Hope this helps!

    1. Running Massive Updates, Deletes, Inserts

      Firstly with large transactions comes large transaction logs, ensure you grow your log accordingly before running a huge transaction because autogrows can cause overhead during execution for example: why did the query run so quickly on test rather than live… the likelihood is you ran it repeatedly on your test box and it had grown already.

      If you plan on running a large insert you should grow the data file accordingly, not only will this improve speed but it will also ensure you have enough space for the query to run!

      Run in loops/batches

      Old way:

      Delete From
      MainTable
      where
      ID in(select ID from DeleteList)
       
      Loopy way:
      declare
      @min int,
      @max int,
      @increments int,
      @Counter int

      SET @min = (select MIN(ID) from DeleteList)
      SET @max = (select MAX(ID) from DeleteList)
      SET @increments = 1000
      SET @Counter = @min

      While(@Counter < @max)
      BEGIN
      declare @progress varchar(10)

      Delete From
      MainTable
      where
      ID between @Counter and @Counter + @increments
      and
      ID in(select ID from DeleteList)
      set @Counter = @Counter + @increments

      set @progress = CAST(@counter as varchar(10))
      RAISERROR ('PROGRESS %s', 0, 1, @progress)
      WITH NOWAIT
      END


      So why use the loop?

      Firstly you are running smaller batches so you’ll only be locking subsets, this is especially important when running on a 24/7 environment, secondly you can use the with (rowlock) option if you wish, you can play around with the batch size to find the optimal update size.

      You should also note the following statement:

      RAISERROR ('PROGRESS           %s', 0, 1, @progress) 
      WITH NOWAIT

      When you want to report progress the print statement can be a bad idea as print messages are buffered until the batch completes, the raiserror with nowait option will immediately send the message to you, when running your query if you notice a long delay in messages you are able to react immediately.

      I know this is a fairly back to basics post but this is something I’ve had to go back to lots of times so thought it would be worth sharing!

      I’ve attached the T-SQL.

    2. First time speaking at SQLBits and the laptop debacle

      So this weekend I was lucky enough to get the opportunity to speak at the largest SQL Server conference in the UK, I’d been attending the conference since it began back in 2007 on and off and have seen it grow from strength to strength attracting some of the best in the industry. I had wanted to speak at the conference for a while but had never managed to submit a session, Chris Testa-O'Neill gave me the push I needed and I submitted the session and was lucky enough to get a spot!

       

      So on the day of the talk I turned up bright and early mainly down to not being able to sleep due to a dodgy nightmare where I alt and tabbed during my demo and there was porn on my the projector, there was no chance of getting any sleep after this!

       

      Martin Bell had arranged for me to test out connectivity first thing to iron out any issues as I was using a new laptop which only had a HDMI connection, guess what? It didn't work!!!!!, there was a moment of complete and utter dread, Steve Bradbury relentlessly tried to fix the issue with no joy, at this point Tim Kent stepped in and offered his laptop.

       

      Working through with Tim and with a little help from Simon Sabin I was able to get my presentation ready for the big screen, various members of the SQLBits team were around and checking to make sure things were ok which was reassuring knowing help was there!

       

      I cant express my gratitude enough to the SQLBits team, with so much going on these guy were able to quickly get me a replacement machine, help me fix a couple of issues moving between machines and keep everything nice and calm as well as ensuring the other sessions were running smoothly.

       

      I must admit I was completely thrown after these issues but I have have to say I feel that I definitely came out stronger having been through this, hopefully next time things will go more smoothly.

       

      I also wanted to say a big thank you to the people who came to the presentation and hope they enjoyed it, the feedback will be much appreciated so hopefully I can take something away from this and improve for next time!

       

      A big thank you to the SQLBits organisers and fusionio (and anyone I may have missed!) for delivering an awesome conference and event!

      Looking forward to SQLBits 9,

      Atul

      Scary

      Picture taken from the bean bag diving championships by Carmel Gunn(Prodata)

    3. LINQ for SQL Developers and DBA’s

      Firstly I’d just like to thank the guys who organise the SQL Server User Group (Martin/Tony/Chris) and for giving me the opportunity to speak at the recent event.

      Sorry about the slides taking so long but here they are along with some extra information.

      Firstly the demo’s were all done using LINQPad 4.0 which can be downloaded here:

      http://www.linqpad.net/

      There are 2 versions 3.5/4.0

      With 3.5 you should be able to replicate the problem I showed where a query using a parameter which is X characters long would create a different execution plan to a query which uses a parameter which is Y characters long, otherwise I would just use 4.0

      The sample database used is AdventureWorksLT2008 which can be downloaded from here:

      http://msftdbprodsamples.codeplex.com/releases/view/37109

      The scripts have been named so that you can select the appropriate way to run them i.e.: C# expression / C#statement, each script can be run individually be highlighting the query and clicking the play symbol or hitting F5.

      Scripts and Slides:

      http://sqlblogcasts.com/blogs/atulthakor/An%20Introduction%20to%20LINQ.zip

      Please don't hesitate in sending any questions via email/twitter, I’ll try my best to answer your questions!

      Thanks,

      Atul

      Posted Mar 08 2011, 11:51 PM by AtulThakor with no comments
      Filed under:
    4. Performance problems loading XML with SSIS, an alternative way!

      I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed:

      Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.

       

       

      image

       

      Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:

       

      image

       

      The same foreach container was used but instead the following SQL command was executed (this is an expression):

       

       

      "INSERT INTO MyTable(FileDate)
      SELECT   CAST(bulkcolumn AS XML)
          FROM OPENROWSET(
              BULK
              '" + @[User::CurrentFile]  + "',
              SINGLE_BLOB ) AS x"

       

       

      Using this method we managed to load approximately 20 records per second, much faster…for data loading!

      For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose! 

       

       

      Openrowset Method

      • Much faster to get the data into SQL
      • You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!)
      • Not able to apply validation/transformation against the data when loading it
      • The SQL Server service account will need permission to the file
      • No schema validation when loading files

      SSIS

      • Slower (in our case)
      • Schema validation
      • Allows you to apply transformations/joins to the data
      • Permissions should be less of a problem
      • Data can be loaded into the final form through the package
      • When using a schema validation errors can fail the package (I’ll do another post on this)
    5. T-SQL (SCD) Slowly Changing Dimension Type 2 using a merge statement

      Working on stored procedure recently which loads records into a data warehouse I found that the existing record was being expired using an update statement followed by an insert to add the new active record.

      Playing around with the merge statement you can actually expire the current record and insert a new record within one clean statement.

      This is how the statement works, we do the normal merge statement to insert a record when there is no match, if we match the record we update the existing record by expiring it and deactivating.

      At the end of the merge statement we use the output statement to output the staging values for the update,  we wrap the whole merge statement within an insert statement and add new rows for the records which we inserted. I’ve added the full script at the bottom so you can paste it and play around.

       

         1: INSERT INTO ExampleFactUpdate 
         2:     (PolicyID,
         3:     Status)
         4:    SELECT -- these columns are returned from the output statement
         5:     PolicyID,
         6:     Status
         7:    FROM 
         8:    (
         9:     -- merge statement on unique id in this case Policy_ID
        10:        MERGE dbo.ExampleFactUpdate dp
        11:        USING dbo.ExampleStag s
        12:            ON dp.PolicyID = s.PolicyID
        13:        WHEN NOT MATCHED THEN -- when we cant match the record we insert a new record record and this is all that happens
        14:            INSERT (PolicyID,Status)
        15:            VALUES (s.PolicyID, s.Status)
        16:        WHEN MATCHED --if it already exists
        17:            AND ExpiryDate IS NULL  -- and the Expiry Date is null
        18:             THEN
        19:            UPDATE
        20:                SET
        21:                    dp.ExpiryDate = getdate(), --we set the expiry on the existing record
        22:                    dp.Active = 0 -- and deactivate the existing record
        23:        OUTPUT $Action MergeAction, s.PolicyID, s.Status -- the output statement returns a merge action which can 
        24:    ) MergeOutput -- be insert/update/delete, on our example where a record has been updated (or expired in our case
        25:    WHERE -- we'll filter using a where clause
        26:        MergeAction = 'Update'; -- here
       
      Complete source for example
         1: if OBJECT_ID('ExampleFactUpdate') > 0
         2: drop table ExampleFactUpdate
         3:  
         4: Create Table ExampleFactUpdate(
         5:     ID int identity(1,1),   3: go
         6:     PolicyID varchar(100),
         7:     Status    varchar(100),
         8:     EffectiveDate datetime default getdate(),
         9:     ExpiryDate datetime,
        10:     Active bit default 1
        11: )
        12:  
        13:  
        14: insert into ExampleFactUpdate(
        15:     PolicyID,
        16:     Status)
        17: select
        18:     1,
        19:     'Live'
        20:  
        21: /*Create Staging Table*/
        22: if OBJECT_ID('ExampleStag') > 0
        23: drop table ExampleStag
        24: go
        25:  
        26: /*Create example fact table */
        27: Create Table ExampleStag(
        28:     PolicyID varchar(100),
        29:     Status    varchar(100))
        30:  
        31: --add some data
        32: insert into ExampleStag(
        33:     PolicyID,
        34:     Status)
        35: select
        36:     1,
        37:     'Lapsed'
        38: union all
        39: select
        40:     2,
        41:     'Quote'
        42:  
        43: select    *
        44: from    ExampleFactUpdate
        45:  
        46: select    *
        47: from    ExampleStag
        48:  
        49:  
        50: INSERT INTO ExampleFactUpdate 
        51:     (PolicyID,
        52:     Status)
        53:    SELECT -- these columns are returned from the output statement
        54:     PolicyID,
        55:     Status
        56:    FROM 
        57:    (
        58:     -- merge statement on unique id in this case Policy_ID
        59:        MERGE dbo.ExampleFactUpdate dp
        60:        USING dbo.ExampleStag s
        61:            ON dp.PolicyID = s.PolicyID
        62:        WHEN NOT MATCHED THEN -- when we cant match the record we insert a new record record and this is all that happens
        63:            INSERT (PolicyID,Status)
        64:            VALUES (s.PolicyID, s.Status)
        65:        WHEN MATCHED --if it already exists
        66:            AND ExpiryDate IS NULL  -- and the Expiry Date is null
        67:             THEN
        68:            UPDATE
        69:                SET
        70:                    dp.ExpiryDate = getdate(), --we set the expiry on the existing record
        71:                    dp.Active = 0 -- and deactivate the existing record
        72:        OUTPUT $Action MergeAction, s.PolicyID, s.Status -- the output statement returns a merge action which can 
        73:    ) MergeOutput -- be insert/update/delete, on our example where a record has been updated (or expired in our case
        74:    WHERE -- we'll filter using a where clause
        75:        MergeAction = 'Update'; -- here
        76:  
        77:  
        78: select    *
        79: from    ExampleFactUpdate
        80:  
    6. Multiple Phone Number Matching

      Hi All,

      So here goes my first post (Thank you Simon with resolving the initial issues I had!).

      I will be eternally grateful to anyone who knows how to paste T-SQL directly into the blog without needing to manually remove the double line spacing!

      I recently did some work involving matching records based on a telephone numbers to find duplicates!

      Here's a brief of the problem with my solution but I would be really curious to know if you guys have any alternative solutions for this problem.

      You start with a table which contains the following:

      ID

      PhoneNo1

      PhoneNo2

      PhoneNo3

      1

      00001

      00002

      00003

      2

      00004

      00005

      00006

      3

      00007

      00001

      00001

      The aim is to compare all records to find matches on phone numbers where any phone number for a record could match any of the phone numbers on another record. So for example ID's 1 and 3 will match because Phone No '00001' is PhoneNo1 on ID 1 and PhoneNo2 on ID 3.

      Initially I tried things like cross joins which didnt work out, the final solution used UnPivot to get the data into a single list which I was able to work with to find duplicates and remove them before identifying matches.

      You should be able to paste the code and run it without any issues (skip the drop's on the first run),

      Cheers

      drop table Numbers
      drop table Numbers_Unpivot

      Create Table Numbers
      (
      ID int identity(1,1),
      PhoneNo1 varchar(10),
      PhoneNo2 varchar(10),
      PhoneNo3 varchar(10)
      )

      insert into Numbers

      select '00001','00002','00003'
      UNION
      select
      '00004','00005','00006'
      UNION
      select
      '00007','00001','00001'

      Create Table Numbers_UnPivot(
      ID int,
      PhoneNo varchar(10),
      PhoneType varchar(10)
      )

      --Unpivot Records

      insert into Numbers_UnPivot
      SELECT
      ID,
      Num,
      ColName
      FROM
      (
      SELECT
        ID,
        PhoneNo1,
        PhoneNo2,
        PhoneNo3
      FROM
       
      Numbers
      ) AS piv
      UNPIVOT
      (Num FOR ColName
      IN (PhoneNo1,PhoneNo2,PhoneNo3)) AS s
      go

      --Remove Duplicates
      with CTE_Duplicates
      As
      (
       
      select
       
      ID,
       
      Row_Number() OVER(PARTITION BY ID,PhoneNo order by ID,PhoneNo) as RN
      from
        Numbers_UnPivot
      )

      delete
      from
        CTE_Duplicates
      where
       
      RN = 2
      go

      --Group up the remainder

      with CTE_Multiples
      As
      (
       
      select
       
      PhoneNo
       
      from
       
      Numbers_UnPivot
       
      group by
       
      PhoneNo
       
      having COUNT(*) > 1
      )

      select *
      from
       
      Numbers_UnPivot
      where
       
      PhoneNo in(
       
      Select
       
      PhoneNo
       
      from
       
      CTE_Multiples)

       

    Powered by Community Server (Commercial Edition), by Telligent Systems