SSIS Pipeline - Katmai changes - Reusing data
Katmai increases the concurrency of SSIS packages by a
change to the pipeline.
In SQL 2005 a dataflow is split into sections, This is called the
execution tree. Each section has the same buffer strucuture and the buffers
are passed to each component in the section of the pipeline. A new secton starts
when an asynchronous component is found. i..e the buffer in is not the buffer
out like the aggregation component
The key here is that the data is not copied to a new buffer for each
component. Rather each component accesses the same buffer (one after each
other). This is great because copying data is an expensive operation.
However it does mean that you are limited to one active buffer. So if you
have multiple components only one is accessing the buffer at anyone time.
Well that was the case in SQL 2005. In SQL 2008 the situation changes. Each
component will be able to access a buffer thus greatly increasing the ability
for you package to run quicker. You are still restrcited by the resources on
your machine, CPU and memory being the main ones for SSIS.
I was reading Alberto Ferrari's blog and though of this http://sqlblog.com/blogs/alberto_ferrari/archive/2007/11/15/ssis-multicast-and-trash-or-separate-and-union.aspx.
In Albertos situation he initially uses multicast and conditional split. Whilst
you might think these cause new sections in the execution tree, they don't they
are in fact synchronous components. What this means is that in the first package
he is only maintaining one copy of the data.
However in the second the use of the UNION which is an asynchrone ous
component, a new section is started and thus the data is copied into a new
buffer and so the time difference he experiences is due to the fact that the
data is being copied.
In summary once data is in the SSIS section of a pipeline if you use it or
not is irrelevant. If a piece of data is not used at all, then don't put it in
the pipeline. Thats why you get the warning message saying a column is't being
used. Your package may perform better if it is not used.
So bring on Katmai and we should see Albertos scenario improve even
further.
-