SQL Server 2008 – iFTS Performance – DocidMap
The next post in the series on iFTS (Integrated Full Text Search) covers
one of the major improvements in SQL Server 2008 for Full Text and is related to
the Integrated part or iFTS.
SQL Server 2005 introduced a huge performance improvement by
moving the location of the docid map.
So what is a docid, you may ask?
Well we all know that to get best performance with indexes
our keys should be as small as possible. When you create a Full Text index you
specify a unique index to be used as the key for the row. However this might be
anything, a uniqueidentifier, a large varchar, anything, so to ensure optimal
performance uses a surrogate key to identify a row in a table. This is the
docid (Document Id).
Note: In full text terms a document is a row in a table.
There is obviously a need to translate between the docid and
the key of the table. Prior to SQL 2005 that was done in the full text engine,
this meant two things. Large keys had to be passed through OLEDB to SQL Server
and also unnecessary mapping may have undertaken in the Full Text service.
In SQL 2005 this mapping was moved into the database which
meant the transfer over OLEDB was quicker and the engine could do more
optimisations in terms of doing the mapping.
So what’s new in SQL 2008?
Firstly the index is stored in the engine and so there is no
OLEDB stuff going on, but more importantly if you have an integer key for your
Full Text index, Full Text will use that key as the docid and so you won’t have
to map between two values.
This has huge benefit when you have full text and normal
relational filters because the optimiser can push the relational filter into
the querying of the full text index and thus read less data and so perform
better.
The following query plans highlight the docid mapping. The first is using a
uniqueidentifier as a key,

And this shows the query plan with an integer key,

The subtree cost of the first is 0.01 (which isn't high but I only have a
174 rows in this table. The subtree cost of the second is 0.000459, thats a 20
fold decrease. Just for using an integer key. All is not lost, if you have a
non-integer key you still benefit from the performance changes and the ability
to use all the relational join operators in the engine, its just you will have a
few extra steps to perform and so performance will not be as good as with an
integer key.
What about upgrade?
If you are upgrading from SQL 2005 you have a number of
options. The first is to maintain the index as is (but held within SQL Server).
This has the benefit of reduced downtime, but you still have a docidmap.
The second option is to rebuild the index. This will rebuild
the index using the new structure AND remove the docidmap. So will give you the
best performance but requires a full rebuild of the index (which should be
quicker in 2008 as well).
The third option takes the index offline and can’t be used.
This allows you to delay the rebuild until a time that is convenient for you
and your server. (If you upgrade 10 databases it is probably not wie to have
them all rebuilding their full text indexes at the same time).
What about new databases ?
One of the other aspects is that because the full text is in
the database you no longer have to enable the database for full text. You
simply need to create your full text catalog and away you go.
Be conscious of the docid issue when creating new tables.
You will get best performance if the unique index you use for the full text
index has an integer key.
The following are the other posts in the series
If you want to try iFTS you can download the SQL Server 2008 from here http://www.microsoft.com/sql/2008/prodinfo/download.mspx
-