SSIS Pipeline - Katmai changes - Reusing data - SimonS Blog on SQL Server Stuff

SSIS Pipeline - Katmai changes - Reusing data

Katmai increases the concurrency of SSIS packages by a change to the pipeline.

In SQL 2005 a dataflow is split into sections, This is called the execution tree. Each section has the same buffer strucuture and the buffers are passed to each component in the section of the pipeline. A new secton starts when an asynchronous component is found. i..e the buffer in is not the buffer out like the aggregation component

The key here is that the data is not copied to a new buffer for each component. Rather each component accesses the same buffer (one after each other). This is great because copying data is an expensive operation.

However it does mean that you are limited to one active buffer. So if you have multiple components only one is accessing the buffer at anyone time.

Well that was the case in SQL 2005. In SQL 2008 the situation changes. Each component will be able to access a buffer thus greatly increasing the ability for you package to run quicker. You are still restrcited by the resources on your machine, CPU and memory being the main ones for SSIS.

I was reading Alberto Ferrari's blog and though of this http://sqlblog.com/blogs/alberto_ferrari/archive/2007/11/15/ssis-multicast-and-trash-or-separate-and-union.aspx. In Albertos situation he initially uses multicast and conditional split. Whilst you might think these cause new sections in the execution tree, they don't they are in fact synchronous components. What this means is that in the first package he is only maintaining one copy of the data.

However in the second the use of the UNION which is an asynchrone ous component, a new section is started and thus the data is copied into a new buffer and so the time difference he experiences is due to the fact that the data is being copied.

In summary once data is in the SSIS section of a pipeline if you use it or not is irrelevant. If a piece of data is not used at all, then don't put it in the pipeline. Thats why you get the warning message saying a column is't being used. Your package may perform better if it is not used.

So bring on Katmai and we should see Albertos scenario improve even further.



-
Published 16 November 2007 13:35 by simonsabin

Comments

No Comments