Lookup performance and buffer usage

Jorg's just written a great post on lookup performance in SSIS http://sqlblogcasts.com/blogs/jorg/archive/2008/10/22/ssis-decrease-your-fact-table-loading-time-up-to-40.aspx

I followed his link to his post on different component types in the data flow http://sqlblogcasts.com/blogs/jorg/archive/2008/02/27/SSIS-_1320_-Non_2D00_blocking_2C00_-Semi_2D00_blocking-and-Fully_2D00_blocking-components.aspx and noticed a comment I had made before. On reading the post again I noticed the "phrase reuse of input buffers". The word reuse is a little misleading. By reuse I read that the underlying thing is maintained but the data is repopulated. This isn't quite true in the way I read it. The output buffer of a synchronous component is the SAME buffer with no changes (except those made by the component) as the input buffer, whats more if you have additional columns add by the component, these will be present on the input buffer as well but not populated.

One of the important design decisions made with SSIS was the copying memory was expensive. Thus where possible things aren't copied, a buffer is defined at the start with enough space to store the data needed for the duration of the buffer through the data flow (until it reaches a asynchronous component). So if you have a column defined as holding DT_STR(100) then your buffer will be 100 characters wide, even if you only have 1 character being stored. This is because it is better to pre allocate the space and deal with known fixed lengths than having to deal with variable lengths and deal with copying data if the values get longer (aka page splits in SQL)
This is why its important to get your data types correct with the correct lengths, and also why the csv importer doesn't just declare all columns as really wide. They try to define the columns as small as possible so each row in a buffer is as small as possible.

So how does this affect lookups. While the error output is synchronous with the input to make any use of the outputs you need to need the two outputs. This involves using the UNION transform. and this transform is a blocking asynchronous transform, i.e. it COPIES ALL the data from ALL inputs to another buffer. That means there is a lot of COPYING going on, and as we said earlier copying is very expensive.

If you have 10 lookups and 10 UNION transforms you will end up with the data being copied 10 times thats a lot of extra work.

Using Jorg's preferred solution data is never copied as all the components are synchronous. Therefore only one set of buffers exist, and so there is no extra memory required to store all these copies and none of the CPU required to make the copies.

In SQL 2008 they have enhanced this further by having multiple active buffers in a data flow to allow each component to be processing a buffer. This provides for greater use of multi processor machines.


 [SS 2008-10-26 Corrected definition of error output following Michaels comment]



-
Published 26 October 2008 23:52 by simonsabin

Comments

27 October 2008 05:50 by mentin

# re: Lookup performance and buffer usage

Small correction: the error output of Lookup is not a new data flow - it is synchronous. You can check if an output is synchronous by opening advanced editor, and checking SynchronousInputID property of an output.

The Union All is async, so it does copy both of its inputs to a new buffer - which usually does slow down things noticeably.

19 January 2009 22:10 by Douglas Laudenschlager

# Learn more about Lookup and its new features in 2008

The SQL Server Integration Services team added valuable new caching options (and scalability) to the