SQL Blog - Pieter van Maasdam, Macaw

SSIS, SSAS, SSRS & other SQL-things I come across...
SSIS - Range lookups

 

When developing an ETL solution in SSIS we sometimes need to do range lookups in SSIS. Several solutions for this can be found on the internet, but now we have built another solution which I would like to share, since it's pretty easy to implement and the performance is fast.

 

You can download the sample package to see how it works. Make sure you have the AdventureWorks2008R2 and AdventureWorksDW2008R2 databases installed. (Apologies for the layout of this blog, I don't do this too often :))

 

To give a little bit more information about the example, this is basically what is does: we load a facttable and do an SCD type 2 lookup operation of the Product dimension. This is done with a script component.

 

First we query the Data warehouse to create the lookup dataset. The query that is used for that is:

 

SELECT

    [ProductKey]

    ,[ProductAlternateKey]

    ,[StartDate]

    ,ISNULL([EndDate], '9999-01-01') AS EndDate

FROM [DimProduct]

 

 

The output of this query is stored in a DataTable:

 

 

string lookupQuery = @"

                        SELECT

                            [ProductKey]

                            ,[ProductAlternateKey]

                            ,[StartDate]

                            ,ISNULL([EndDate], '9999-01-01') AS EndDate

                        FROM [DimProduct]";

 

        OleDbCommand oleDbCommand = new OleDbCommand(lookupQuery, _oleDbConnection);

        OleDbDataAdapter adapter = new OleDbDataAdapter(oleDbCommand);

 

        _dataTable = new DataTable();

        adapter.Fill(_dataTable);

 

 

Now that the dimension data is stored in the DataTable we use the following method to do the actual lookup:

 

public int RangeLookup(string businessKey, DateTime lookupDate)

    {

        // set default return value (Unknown)

        int result = -1;

 

        DataRow[] filteredRows;

        filteredRows = _dataTable.Select(string.Format("ProductAlternateKey = '{0}'", businessKey));

 

        for (int i = 0; i < filteredRows.Length; i++)

        {

            // check if the lookupdate is found between the startdate and enddate of any of the records

            if (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])

            {

                result = (filteredRows[i][0] == null) ? -1 : (int)filteredRows[i][0];

                break;

            }

        }

 

        filteredRows = null;

 

        return result;

    }

 

 

 

This method is executed for every row that passes the script component. This is implemented in the ProcessInputRow method

 

public override void Input0_ProcessInputRow(Input0Buffer Row)

    {

        // Perform the lookup operation on the current row and put the value in the Surrogate Key Attribute

        Row.ProductKey = RangeLookup(Row.ProductNumber, Row.OrderDate);

    }

 

Now what actually happens?!

 

  • 1. Every record passes the business key and the orderdate to the RangeLookup method.
  • 2. The DataTable is then filtered on the business key of the current record. The output is stored in a DataRow [] object.
  • 3. We loop over the DataRow[] object to see where the orderdate meets the following expression:

(lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])

  • 4. When the expression returns true (so where the data is between the Startdate and the EndDate), the surrogate key of the dimension record is returned

 

We have done some testing with this solution and it works great for us. Hope others can use this example to do their range lookups.

Posted Friday, February 4, 2011 1:41 PM by Repieter | with no comments

SSIS 2008 - Rowcounts using the script component

I have been using the Rowcount Component for some time now. The thing I didn't like about it, is that I had to create an SSIS variable for every flow in the Data Flow Task. For example: when extracting data we sometimes have more than 30 tables in a data flow. So, I have been trying to find a more flexible way to add rowcounts to my packages, without having to create a lot of new variables.

Here's an example of what I came up with (It's not ready to be used in production environments by the way):

Rowcount controlflow 

I created an SSIS package variable of datatype Object named RowcountList. In the first Script Task, i initialize it by assigning an ArrayList to it. Next step is in the data flow task:

Rowcount dataflow

In the Script component I use an integer to count the rows. Then, in the PostExecute method I create an ArrayList based on the RowcountList variable. Then, I add the name of the script component combined with the rowcount to the ArrayList and store that in the SSIS variable RowcountList.

Finally, in the last Script Task I iterate through the ArrayList and store the rowcounts in a custom logging table. Now, it seems to me this is a good way to do the rowcounts, but I'm very curious if other people have tried to do this and maybe have found a better way.

Posted Monday, January 25, 2010 11:18 AM by Repieter | with no comments

SSRS -Report execution failed. Solution: SSPI=NTLM

When executing reports (Reporting Services 2008) with Analysis Services 2008 as a source I sometimes get an error saying: "Query execution failed for...". The problem seemed to have something to do with large amounts of data (just guessing this...)

To fix this error we had to add SSPI=NTLM to the connectionstring:

Data Source=MyServer;Initial Catalog=MySSASDB;SSPI=NTLM;

I'm glad it's working now, although I don't understand what it actually does... 

 

 

 

Posted Tuesday, November 17, 2009 1:55 PM by Repieter | 3 comment(s)

SSAS 2008 - Connection from Excel

I had some trouble connecting to an SSAS 2008 cube with Excel 2007. The problem was that Windows Firewall blocked the connection. A colleague of mine pointed out that you need to open 2 tcp ports on the server: 2382 and 2383. That did the trick.

Posted Tuesday, March 3, 2009 1:31 PM by Repieter | with no comments

Filed under:

SQL - Wrong results in Isoweek function
 
In our organisation we use the ISOweek function to determine weeknumbers for a given date. This function appears to be dependant of the language setting (@@DATEFIRST) on the SQL Server. For example: January 5, 2009 should return weeknumber 2, but since the language is set to us_english it returns weeknumber 1 for the given date. Since the weeknumber returned by this function is not correct, we needed another solution.
 
We found another sql-function that checks the DATEFIRST setting before it calculates the weeknumbers: 
 
This gave us the expected result.
 

Posted Tuesday, February 24, 2009 1:25 PM by Repieter | with no comments

Filed under:

SQL2008 - IsoWeek in Datepart function

To calculate the correct weeknumber I always used the IsoWeek function (found here: http://msdn.microsoft.com/en-us/library/aa258261(SQL.80).aspx).

In SQL 2008 this function is available in the DATEPART function:

SELECT DATEPART(wk, '4 jan 2009') AS WeekNumber, DATEPART(isowk, '4 jan 2009') AS IsoWeekNumber

Output:

WeekNumber IsoWeekNumber
2 1

Posted Tuesday, February 24, 2009 8:01 AM by Repieter | 1 comment(s)

Filed under:

SSIS "Failure sending mail" problem

I was trying to send an email from an SSIS package when a package failed. However, the Send Mail Task failed everytime. There was no error message except the one in the sysdtslog90 table:

 An error occurred with the following error message: "Failure sending mail.".

I found the problem was McAfee. Here's the solution:

  • Open VirusScan Console
  • Double-click 'Access Protection'
  • Uncheck the option 'Prevent mass mailing worms from sending mail'
  • Click Ok

After that, McAfee is no longer blocking the outgoing mails.

Posted Wednesday, October 1, 2008 1:23 PM by Repieter | with no comments

SQL 2005 SSAS deployment error

During a deployment of an SSAS project I got an error saying something about System.Data.Listener. The problem was that the server didn't have Service Pack 1 of the .Net Framework installed. After that the problem was solved.

Posted Thursday, April 3, 2008 11:14 AM by Repieter | with no comments

SSAS2005 - Using logged in user within a role

A customer of ours has a security model stored in a database and they wanted to have the security in the cube to be the same, so I came up with te following solution:

Example of the database in which the security is stored:

 

The DimUser table contains the users and their AD login account. The FactHours contains the hours that they have booked. The security is stored in the many-to-many table FactHoursDimUser, so the data in this table shows which user can see which fact. Go to the "Dimension Usage" tab to set the right relationships between the tables:

Next step is to add a role to the cube and add user groups on the Membership tab.

After that, you go to the dimension data tab, select the user dimension, the loginname attribute and then enter the following MDX expression to the "Allowed member set" section: {STRTOMEMBER("[User].[Loginname].[" + username+ "]")}. Also, make sure that "Enable visual totals" is enabled, so the calculations for the totals will only show what the logged in user is allowed to see.

 

 After that, process the cube and go to the browser tab. Select "Switch user" to view the cube data with other credentials to see the results.

 

 

Posted Friday, December 7, 2007 12:27 PM by Repieter | 1 comment(s)

SSAS2005 - Processing failed due to collation difference

Today, I was trying to create a simple dimension, but kept getting the error "Attribute key cannot be found". Since I was processing a dimension without any relationship to another table, I was very surprised to see this error. It appeared that there was one record that had a column containing the character 'ë'. When I changed this to the character 'e', everything worked. So, after checking the collation I saw that the relational database was Latin1_General_CP1_CI_AS and the cube was Latin1_General_CP1_CI_AI.

So, in order to fix this:

  • Open the management studio
  • Connect to the Database Engine
  • Right click the Instance and select properties to see which collation is used
  • Connect to the Analysis Services
  • Right-click the Instance and select properties
  • Click on "Language/Collation"
  • In my case: check the box "Accent-Sensitive"
  • Click Ok
  • Restart the SQL Server service
  • Restart the Analysis Services service
  • Re-deploy the Analysis Services database

After that it worked. I guess the person who installed the software didn't select the default settings (?)

Posted Wednesday, November 28, 2007 3:03 PM by Repieter | with no comments

More Posts Next page »