SQL Server 2008 – iFTS Manageability – Loading thesaurus files

The next post in the series on iFTS (Integrated Full Text Search) covers a thesaurus files. A feature introduced in SQL 2005.

The loading of thesaurus files is a bit of an uncontrolled beast in SQL Server 2005 (there weren’t supported or documented prior to SQL 2005). The only way you could force a thesaurus files to be reloaded and used was to restart the Full Text service. (I believe you may have had to restart the SQL Service as well but that isn’t documented). This is obviously a slight pain.

In SQL Server 2008 thesaurus files are still files on the file system residing in the FTDATA folder, but the difference is that you can reload them when you want using the sys.sp_fulltext_load_thesaurus_file procedure and specifying which language to load i.e.

EXEC sys.sp_fulltext_load_thesaurus_file 1033;

The books online has(will have by RTM) much better FTS documentation that in SQL Server 2005 around thesaurus files.

One area to be wary of a term can only be included in one pat of a replacement or sub of an expansion. If you have a term repeated it appears that expansions take precedence over replacements and the order in the file the takes precedence. If you find that words are not being expanded or replaced as expected check whether you have included the term more than once. There are some trace flags that support can use to help you if you run into difficulties.

If you are running a search function on a website, thesaurus files are a great way of allowing or mis spelling of terms. i.e. Mircosoft, using expansion elements will allow the user to spell something wrong and the person writing the content to have got it wrong. This would be the case if users could post their own content. If you know your content is rock solid you can just use replacement to auto correct spelling mistakes into the correct spelling, i.e.

        <replacement>
            <pat>mircosoft</pat>
            <pat>microsft</pat>
            <pat>micrsoft</pat>
            <sub>microsoft</sub>
        </replacement>

If you are not capturing what people are searching on and the results they get back then you should be, the information whilst often overwhelming can be like gold dust. If you’re running a music website how many people know how to spell Anastacia. If you reviewed misspelt searches you could return results based on a corrected spelling or like Live does append the results to those of the badly spelt word as often a misspelt word can still be a valid word even in the context of the search. In the case of Anastacia if I entered Anastasia this is still valid as there are film scores for this however based on popularity you could assume people wanted Anastacia and so included those results by use of a thesaurus.

The following are the other posts in the series

If you want to try iFTS you can download the SQL Server 2008 from here http://www.microsoft.com/sql/2008/prodinfo/download.mspx



-
Published 20 February 2008 12:11 by simonsabin

Comments

# SQL Server 2008 - Integrated Full Text Search (iFTS) ...

Voici une suite de 6 articles tr&#232;s int&#233;ressants sur les nouveaut&#233;s de la recherche en

10 February 2010 08:58 by Simon Sabin UK SQL Consultant's Blog

# Microsoft ditches Unix in the search market

10 February 2010 09:43 by SimonS Blog on SQL Server Stuff

# Microsoft ditches Unix in the search market

Search is one of my interests, where ever you go, whatever business you are in search is the like a holy