SQL Server 2008 – iFTS Manageability – Loading thesaurus files
The next post in the series on iFTS (Integrated Full
Text Search) covers a thesaurus files. A feature introduced in SQL 2005.
The loading of thesaurus files is a bit of an uncontrolled beast in SQL
Server 2005 (there weren’t supported or documented prior to SQL 2005). The only
way you could force a thesaurus files to be reloaded and used was to restart the
Full Text service. (I believe you may have had to restart the SQL Service as
well but that isn’t documented). This is obviously a slight pain.
In SQL Server 2008 thesaurus files are still files on the file system
residing in the FTDATA folder, but the difference is that you can reload them
when you want using the sys.sp_fulltext_load_thesaurus_file procedure and
specifying which language to load i.e.
EXEC sys.sp_fulltext_load_thesaurus_file 1033;
The books online has(will
have by RTM) much better FTS documentation that in SQL Server 2005 around
thesaurus files.
One area to be wary of a term can only be included in one pat of a
replacement or sub of an expansion. If you have a term repeated it appears that
expansions take precedence over replacements and the order in the file the takes
precedence. If you find that words are not being expanded or replaced as
expected check whether you have included the term more than once. There are some
trace flags that support can use to help you if you run into difficulties.
If you are running a search function on a website, thesaurus files are a
great way of allowing or mis spelling of terms. i.e. Mircosoft, using expansion
elements will allow the user to spell something wrong and the person writing the
content to have got it wrong. This would be the case if users could post their
own content. If you know your content is rock solid you can just use replacement
to auto correct spelling mistakes into the correct spelling, i.e.
<replacement>
<pat>mircosoft</pat>
<pat>microsft</pat>
<pat>micrsoft</pat>
<sub>microsoft</sub>
</replacement>
If you are not capturing what people are searching on and the results they
get back then you should be, the information whilst often overwhelming can be
like gold dust. If you’re running a music website how many people know how to
spell Anastacia. If you reviewed misspelt searches you could return results
based on a corrected spelling or like Live does append the results to those of
the badly spelt word as often a misspelt word can still be a valid word even in
the context of the search. In the case of Anastacia if I entered Anastasia this
is still valid as there are film scores for this however based on popularity you
could assume people wanted Anastacia and so included those results by use of a
thesaurus.
The following are the other posts in the series
If you want to try iFTS you can download the SQL Server 2008 from here http://www.microsoft.com/sql/2008/prodinfo/download.mspx
-