Lookup performance and buffer usage
Jorg's just written a great post on lookup performance in SSIS http://sqlblogcasts.com/blogs/jorg/archive/2008/10/22/ssis-decrease-your-fact-table-loading-time-up-to-40.aspx
I followed his link to his post on different component types in the data flow
http://sqlblogcasts.com/blogs/jorg/archive/2008/02/27/SSIS-_1320_-Non_2D00_blocking_2C00_-Semi_2D00_blocking-and-Fully_2D00_blocking-components.aspx and
noticed a comment I had made before. On reading the post again I noticed the
"phrase reuse of input buffers". The word reuse is a little misleading. By reuse
I read that the underlying thing is maintained but the data is repopulated. This
isn't quite true in the way I read it. The output buffer of a synchronous
component is the SAME buffer with no changes (except those made by the
component) as the input buffer, whats more if you have additional columns add by
the component, these will be present on the input buffer as well but not
populated.
One of the important design decisions made with SSIS was the copying memory
was expensive. Thus where possible things aren't copied, a buffer is defined at
the start with enough space to store the data needed for the duration of the
buffer through the data flow (until it reaches a asynchronous component). So if
you have a column defined as holding DT_STR(100) then your buffer will be 100
characters wide, even if you only have 1 character being stored. This is because
it is better to pre allocate the space and deal with known fixed lengths than
having to deal with variable lengths and deal with copying data if the values
get longer (aka page splits in SQL)
This is why its important to get your
data types correct with the correct lengths, and also why the csv importer
doesn't just declare all columns as really wide. They try to define the columns
as small as possible so each row in a buffer is as small as possible.
So how does
this affect lookups. While the error output is synchronous with the input to make any
use of the outputs you need to need the two outputs. This involves using the UNION
transform. and this transform is a blocking asynchronous transform, i.e. it COPIES ALL
the data from ALL inputs to another buffer. That means there is a lot of COPYING
going on, and as we said earlier copying is very expensive.
If you have 10 lookups and 10 UNION transforms you will end up with the data
being copied 10 times thats a lot of extra work.
Using Jorg's preferred solution data is never copied as all the
components are synchronous. Therefore only one set of buffers exist, and so
there is no extra memory required to store all these copies and none of the CPU
required to make the copies.
In SQL 2008 they have enhanced this further by having multiple active buffers
in a data flow to allow each component to be processing a buffer. This provides
for greater use of multi processor machines.
[SS 2008-10-26 Corrected
definition of error output following Michaels comment]
-