Sutha's SQL BI Blog

Late Arriving Facts & SSIS Contd

As discussed before if you have a type 2 dimension and late arriving facts you are in a messy situation. Most of us do it using SQL Statements.

Lets see we could do this using SSIS. One of my colleague the following way, which seems to be a very good option but SSIS data flow task might be bit messy.

Let me describe each steps.
1. Read the data from Stage using OLE DB Source.
2. Conditionally split into 2 pipeline. Conditionally split would be done on a key date column. For example if you loading orders on a daily basis and you have a control table which data you have loaded already. You could use that table as a source and map it to a variable. Then anything less than that variable would be sent down the late arriving facts pipeline, where the others will send down the default (New records) pipeline. Some of you would be happy would setting late arring facts as anythign less than 2 days old. That would work too as long as you are happy with that scenario.
3. New records pipeline would use full cache lookups to get their surrogate keys. They will run fast.
4. Late arriving facts need to do the Lookups with the range value. This will work but will  be slow. The reason for that is it needs to go to database for each record. By splitting up this way we have suggested only few records should go to the database for the surrogate keys.
5. Please note if you do this on SQL 2005, it will definitely go to disk for each record. In SQL2008, you can use Partial Cache to build the cache.

Since I wrote this article Matt Masson has put a great blog together about this. You can read it here.


Posted: Nov 22 2008, 05:41 PM by tsutha | with 3 comment(s)
Filed under:


tsutha said:

Matt Masson has given a few alternative. He has written custom script which seems to outperform any other options. You can read all about on Matt's blog.



# November 27, 2008 2:02 PM

Pages tagged "facts" said:

Pingback from  Pages tagged "facts"

# December 6, 2008 12:07 AM

CozyRoc said:

There is solution based on the third-party commercial CozyRoc SSIS+ library. CozyRoc has implemented data flow destination script, which creates memory-efficient range dictionary object. The dictionary object can then be used in CozyRoc Lookup Plus component. For more information and demonstration how to use the script, check here:

# January 27, 2010 8:19 PM