July 2007 - Posts

The key to performance with partition management is to deal with empty partitions. That way all you have is meta data changes and no data has to be copied.

We recently encountered a situation where we had two partitions and wanted to merge them. In a merge you have 3 boundaries the left one, the middle one and the right one. In doing a merge you are going to get rid of the middle one and have 1 partition that is bounded by the left and right values. The partition on the left of the middle boundary was empty and the one on the right wasn't. We were using RIGHT partitioning so the partition value for the middle boundary was included in the RIGHT hand partition. We expected this to be very quick. The engine you take the RIGHT hand partition and use it for the new partition.

But NO it was very slow.

In this situation the new partition has two boundaries the left and the right ones. Because we are using RIGHT based partitioning the boundary value included in this partition is the left hand one, this is the closed boundary.

What we found was that when doing a merge the engine takes the partition with the remaining closed boundary as the base partition and copies the merges the other partition with it. If doing RIGHT based partitioning this means if you are trying a sliding window partition when you switch the last partition out and then merge the last two partitions it moves the data from the right partition into the left partition, because the left partition has the closed boundary. This obviously sucks for performance.

To get round this, what you need to do is 1 of two things,

1. Leave the 2 partitions at the end so that both are empty, when these are merged the operation will be just meta data

2. Switch out the partition with the data, perform the merge on the now two empty partitions. Switch the partition back in. To do this you have to align the check constraints with the partition you are switching into. Which is a bit of a pain.


Personally I think networks are a black art, I think there are probably less network admins that there are DBAs in the world. As for diagnostics and tools for network configuration, as a SQL Server person I feel I now know what its like configuring and running Oracle.

My latest venture into networks was to answer a question on dynamic ports in SQL Server. I was pointed to an article on SQLJunkies that pointed out that SQL Server will try and use the last configured port and if it is used it will use another one.

This is obviously not a good scenario if your clients have stored the port number and is only half the story. Let me start from the beginning.

A SQL Server instance has a number of network protocols it supports ( a few got canned for SQL 2005). One or these is TCP/IP. A service uses TCP/IP by listening on a port, a webserver for instance by default listens on port 80 and 443 (SSL). For SQL Server each instance needs to listen on a different port so it knows which instance should proces which requests. To enable this, the default instance (unnamed instance)  by default listens on a fixed port 1433 and named instances are configured for a "dyanmic port".

So what does a dynamic port mean. What this means is that SQL Server will pick a port dynamically when the service is started usually something like 53394. If that port is in use it will try another one. This will continue until it finds one that is not being used (Only one service can use the same port). On subsequent restarts of the service it will try and use the same port it used before. This should work, but if another service has been installed or is runnig now that didn't before, the port might be being used. In which case it tries to find an unused port. This is what the article I was pointed to refers.

What is not clear is that this only applies to dynamic port allocation. If you are using fixed ports then this won't happen. If SQL can't use the fixed port it will error and won't start.

So is this important. Well you might think that an application has no chance of working if the port keeps changing because the client application won't know what port to send requests to. Well this is where the Browser Service comes in it translates an instance name into a port number. It also provides the discovery service so applications can find out what instances exist on the server (this was a big change from SQL 2000 when the sqlServer.exe provided this information and is how the slammer worm spread).

However whilst the browser service can provide the dynamic port, in most network environments the ports firewalls only have a few ports open and also some clients (java ones for a start) hard code the port number in the the connection sting. 

So in a production environment you should always fix the port number for all instances not just the default instance. How do you do that, well I'll leave that to another post.



I recently visited a client that was achieving wonders with DTS, I was there to give them pointers on SSIS. I spent a day gogin over SSIS from the control flow to configrations.

A couple of things really got me was that SSIS is a new world that youhave to learn and boy there is a lot to learn, SSIS has many ways of achieving the same thing and many ways of doing it badly.

Whilist SSIS can do a lot, if your familiar with programming why venture away from the activex script.

The interface sucks so why not write it all yourself.

I can't express how I felt after this day, the realisation of how difficult SSIS is was overwhelming. Having used SSIS for 2 years I the number of features that require special attention is huge.

SSIS is great for what it is designed for, but you need to all tricks to really get the power out of it.

Jamie has responsding to similar annoyances on his blog http://blogs.conchango.com/jamiethomson/archive/2007/07/27/SSIS_3A00_-The-backlash-continues.aspx



For those that downloaded the enhanced error component I forgot to make it clear that the new version requires you to have service pack 1 installed. The reason being is that we are using an interface that allows us to look at the whol package in the component UI.

If you were around at a SSIS presentation in 2005 you will have seen the statements on the extensibility. The statement from MS was that they have used exactly the interfaces we do to create their components. However on investigation we couldn't figure out how they had written the Slowly Changing Dimension to add components using the public interfaces. Well what was missing was the pipelineService that was made public in Service Pack 1.

So you can now write components just like Microsoft. If you are interested in the code for the error component let me know.

In the meantime feel free to download the component from here



Should you connect configure error outputs for your components in a data flow? I'm putting a SSIS crib sheet together I asked Jamie Thomson to have a look over it and one point he question was my statement on error outputs.

My view is that you should connect up error outputs to a destination to capture erroneous data. His argument is that he would prefer the package to fail (if you handle the errors the data flow won't fail) if you get errors.

I suppose its two different mindsets, I come from an operational background where I know first hand that the cost of something failing is very high, mainly from a disruption factor but also because until you fix it, it doesn't work and more than likely the subsequent processes won't work. If you've got change control processes and tight development resource it might take some time before a fix is available. I'm therefore of the view that having something is better than nothing.

Further more configuring the error output to redirect rows allows you to easily debug what is causing an error. You could still put a rowcount transform on the error output and only proceed if the rowcount is 0, at least that way you have the erroneous data and some error codes.


I've just been tearing my hair out trying to get variables and locking working. It was only after one of those eureka moments that I realised that the variable names are case sensitive.

So the lesson learnt is make sure that you have a naming convention for your variables so you don't have to remember if your variables are all lower case (unreadable) , title case (my preference) or camel case.

I've also decided that I prefer to use the lockForWrite over the LockOneForWrite and even over specifying the names in as a delimited list. The reason being is that you have the better control in doing it that way.


I am starting to regret installing 64 bit vista on my laptop, my woes are continuing as I found out today I can't debug script tasks.

I do hope this is fixed in Katmai :)

The full limitations, and there are a few, can be found here




It used to be fact that the leaf page of a non-clustered index pointed to the row for the page, because it pointed to the file, page and row if the file or page of the row changed the nonclustered index had to be updated.

Now however the non-clustered index has the Unique Clustering Key value for the main data row. I specifically say Unique because you can create a non unique clustered index. In this case SQL generates an additional bit of data to make a unique key for each row in the table. (One reason why clustering on a non-unique column  is bad).

What this means is that the data row can move about all over the place with out affecting the non-clustered index.

So what happens when you, create a clustered index, reindex a clustered index, recreate (using with drop existing) and drop a clustered index. This question came up in a recent presentation I was sure of the first and last, but not of the middle 2 so I did a repro and was pleasantly suprised.

The first and last both result in the non-clustered index being changed, the middle two however don't.

The example below puts the index on a different filegroup and looks at the IO for each filegroup after the operation. File 3 relates to the nonclustered index and you can see that the index creation causes IO, so does the creation of the clustered index (changing the row pointers to key values). But the reindex and the create with drop existing doesn't.

So in summary if you need to rebuild a clustered don't drop it and recreate, instead use alter index, dbreindex, or with drop_existing.



if not exists (select * from sysfilegroups where groupname = 'indexes')


      alter database blogcode add filegroup indexes

      alter database blogcode add file (name=indexes, filename='c:\indexes.mdf' , size=10) to filegroup indexes



declare @dbid int

drop table simon

create table simon(

   clusteredCol int identity(1,1)

  ,col2 uniqueidentifier  )

insert into simon(col2)

select top 20000 newId()

from syscolumns a

cross join syscolumns b


select identity(int,1,1) id,fileId,cast('start' as varchar(30)) as action,numberreads,numberwrites into #stats

from ::fn_virtualfilestats(db_id(),null)

create nonclustered index IX_simon_col2 on simon(Col2) on indexes

insert into #stats (fileId,action,numberreads,numberwrites)

select fileId,'create nonclustered',numberreads,numberwrites from ::fn_virtualfilestats(db_id(),null)

create clustered index IXC_simon_clusteredCol on simon(clusteredCol)

insert into #stats (fileId,action,numberreads,numberwrites)

select fileId,'create clustered',numberreads,numberwrites from ::fn_virtualfilestats(db_id(),null)

create clustered index IXC_simon_clusteredCol on simon(clusteredCol) with drop_existing

insert into #stats (fileId,action,numberreads,numberwrites)

select fileId,'create clustered with drop',numberreads,numberwrites from ::fn_virtualfilestats(db_id(),null)

dbcc dbreindex(simon,IXC_simon_clusteredCol )

insert into #stats (fileId,action,numberreads,numberwrites)

select fileId,'dbreindex',numberreads,numberwrites from ::fn_virtualfilestats(db_id(),null)

drop index simon.IXC_simon_clusteredCol

insert into #stats (fileId,action,numberreads,numberwrites)

select fileId,'drop clustered',numberreads,numberwrites from ::fn_virtualfilestats(db_id(),null)

select id, action, fileId, numberreads, numberwrites

from #stats

order by fileid, id


drop table #stats



This is the first in a series of posts that are aimed at dispelling myths and beliefs as to how SQL Server operates. I have interviewed too many people and have been in too many sessions where statements are made as though they are fact when in reality they are not. Often the reason is due to a change in behaviour between versions or a mis-interpretation of some information.

The first is a very new one and relates to a feature in SQL Server 2005, Instant File Initialization.

What this feature does is allow SQL Server to create a file with the right size but not to make it zero it out, i.e. zero all the bytes in the file. The OS just allocates the disk space but the contents of the file is actually what is still on the disk, its not changed.

Firstly it is available in ALL editions including express. This is easily demonstrated by creating a database with a very small log and a huge data file. It should happen within seconds indicating that your service account has the correct permissions.

Secondly it DOES NOT apply to log files, these have to be zeroed. This is covered by most people, but some forget to explicitly state that Log files are excluded from instant file initialization.

This is one of the best discussions on instant file initialisation http://www.microsoft.com/technet/prodtechnol/sql/2005/physdbstor.mspx


If you use full text and wat to index pdf documents you might have found you run into problems on a 64 bit machine. This is because there is no 64 bit iFilter for pdfs. The iFiliter is the bit of code that interprets the code and spits out the information for the search service (full text) to use to generate its indexes.

Well now one is available, not from adobe or Microsoft but a company called Foxit The file can be downloaded here http://www.foxitsoftware.com/pdf/ifilter/

Thanks to the iFIlter blog for pointing this outhttp://blogs.msdn.com/ifilter/archive/2007/05/10/long-awaited-64-bit-pdf-ifilter-finally-available.aspx

More Posts Next page »