As discussed before if you have a type 2 dimension and late arriving facts you are in a messy situation. Most of us do it using SQL Statements.
Lets see we could do this using SSIS. One of my colleague the following way, which seems to be a very good option but SSIS data flow task might be bit messy.
Let me describe each steps.
1. Read the data from Stage using OLE DB Source.
2. Conditionally split into 2 pipeline. Conditionally split would be done on a key date column. For example if you loading orders on a daily basis and you have a control table which data you have loaded already. You could use that table as a source and map it to a variable. Then anything less than that variable would be sent down the late arriving facts pipeline, where the others will send down the default (New records) pipeline. Some of you would be happy would setting late arring facts as anythign less than 2 days old. That would work too as long as you are happy with that scenario.
3. New records pipeline would use full cache lookups to get their surrogate keys. They will run fast.
4. Late arriving facts need to do the Lookups with the range value. This will work but will be slow. The reason for that is it needs to go to database for each record. By splitting up this way we have suggested only few records should go to the database for the surrogate keys.
5. Please note if you do this on SQL 2005, it will definitely go to disk for each record. In SQL2008, you can use Partial Cache to build the cache.
Since I wrote this article Matt Masson has put a great blog together about this. You can read it here.
Thanks
Sutha
One of my colleague asked us how are we currently handling Late Arriving Facts against Type2 dimension within SSIS.My answer was we don’t. Well I do ask the client do you have late arriving facts. Most of the time they don’t know what is late arriving facts. Once explained they turned around and say this never happens in our organisation. I am sure pretty sure it does happen and most of the time client doesn’t recognise it.
If you do have a warehouse with Type1 dimension late arriving facts is not an issue for you. This is an issue when you have type2 dimension. Kimball call this "messy situation".
I did ask this to 3 SSIS developers at SQLPass this week. The answer is not easy and will get back to you. In the meantime we have discussed about this scenario among ourselves and decided one of two ways.
1. Do it using SQL rather than SSIS
2. Split the records first in the Data Flow Task and handle it using Lookup. I will explain it in detail how to do in a separate posting.
Thanks
Sutha
I have been asking for Dependancy Analysis Viewer within SSIS since TAP for SQL2005. At last Microsoft seems to have listened to most of our needs and it might appear in the next release. Well done, well overdue but well appreciated.
In SQL2008, MS have released Resource Governor, but it does NOT govern Analysis Services. I have been working with AS since SQL7 and we always have long running MDX queries, bu we dont know who is running etc. I understand next release might have Resource Governor which would support AS.
Thanks
Sutha