SSIS Lookup Transformation Dynamic SQL workaround
In SSIS, Lookup Transformation is really an important tool. But the design of it prevents user to set the SQL statement dynamicly.
In my work, I need to prepare a text file for BULK loading into table. The original csv file could be 10 million ~ 1 billion rows. On the way I need to lookup several IDs by Names from different tables. These reference tables all have a Project_ID column, which can sub-grouping data for a specific project. Normally the original csv file only holds data for a single project.
Of cause I can just do:
SELECT xx_ID, xx_Name FROM dbo.TABLE1
to fill the Lookup Tran with full cache. But doing this will: use lots of memory; waste resource; and bad performance.
I would rather want to do:
SELECT xx_ID, xx_Name FROM dbo.TABLE1 WHERE Project_ID = ?
Which the Project_ID value is only known at runtime by user input parameter. Unfortunately Lookup Tran does not have that enabled (yet, maybe will in future version.)
So I came up with this idea: BCP dump + OPENROWSET (BULK).
First we use an Execute Process Task, to call Bcp.exe, specify the Arguments using Expressions, the Project_ID will be a SSIS variable. This is like you run this line in CMD:
c:\Bcp.exe "SELECT xx_ID, xx_Name FROM dbo.TABLE1 WHERE Project_ID = 101" queryout tempref1.txt -N -T
Then you will get a text file with columns you want, for the project that is relevant to this run. The reason to specify -N is to save char like in unicode and everything else in SQL server native format, this can save a lot both on file size and I/O speed for later reading process.
Second step, configure our Lookup Trans by set it to Use results of an SQL query, the query should be like this:
SELECT a.* FROM OPENROWSET( BULK 'c:\tempref1.txt', FORMATFILE = 'c:\tempref1.fmt') AS a
I didn't mention the step to create the format file.
If the SQL statement is correct and both the txt and fmt files are ok, when you click the Preview button, you'll see a data grid. Then just continue your normal configuration of Lookup Trans.
So far this works for my situation. I can do multiple Lookups by Bcp to different file name. Obviously I can't fire two instances of the package at the same time. Also I can't set the path and file name at runtime too.
Any comments are welcomed! (Maybe I can do a sample package if anyone interested, in Northwind or AdventureWorks context?)