SQL Server 2008 – iFTS Transparency – dm_fts_parser

SQL Server 2008 – iFTS Transparency – dm_fts_parser

In the next in this series of posts on Integrated Full Text Search (iFTS) in SQL Server 2008, we look at the new dmv dm_fts_parser.

Wow thats a cool function name what does it do Simon?

Well in my first post I talked about the processes involved in the full text process which until now have been black boxes. This function makes some of these more transparent from a querying perspective.

dm_fts_parser takes a full text query and breaks it up using the word breaker rules, applies stop lists (more on them later), and any configured thesaurus. This is essential in the first step of diagnosing when users are complaining because there queries aren’t working. Often this is due to, a word not breaking as expected, use of noise words that exist in the stop list or thesaurus replacing  or substituting words.

You call the function using the same query string as you would use normally with a CONTAINS statement, along with a language, a stop list and where the search should be accent sensitive.

SELECT *
FROM
sys.dm_fts_parser ('FORMSOF( THESAURUS, "Internet Explorer")', 2057, 0, 0)

This  returns the following,

You can see that in my thesaurus I have added substitution elements for Internet Explorer or firefox and netscape.

The following query ,

SELECT *
FROM
sys.dm_fts_parser ('multi-million', 2057, 0, 0)

Returns the following showing how the word breaking as broken the word up but also maintained the combined word.

Finally

SELECT *
FROM
sys.dm_fts_parser ('SQL OR Server OR 2008 OR is OR the OR best', 2057, 0, 0)

Returns the following which nicely indicates which words are noise words but also that numbers are searched as numbers and text. Note the nn prefix.

And finally finally, the query about c++, c# etc.

SELECT *
FROM sys.dm_fts_parser ('C or c or C++ or c++ or C# or c#', 2057, 0, 0)

Returns the following, which shows what you need to put in to get an exact search on c++, or c#. Capitalise the C. What’s also interesting is that C, C++ both relate to C as well but C# doesn’t, which means it C is removed from the noise word then C++ would return any document containing the word C.

The following are the other posts in the series

If you want to try iFTS you can download the SQL Server 2008 from here http://www.microsoft.com/sql/2008/prodinfo/download.mspx



-
Published Wednesday, February 20, 2008 9:40 AM by simonsabin

Comments

Tuesday, October 7, 2008 8:50 AM by SQL Server, BizTalk Server, le 64 bits et au-delà !...

# SQL Server 2008 - Integrated Full Text Search (iFTS) ...

Voici une suite de 6 articles très intéressants sur les nouveautés de la recherche en

Saturday, November 8, 2008 8:31 PM by Marco Scheel aka GeekDotNet

# Microsoft SQL Server 2008 – Interated Fulltext Service ist cool

In vielen unserer Projekte ist ein SQL Server Teil der Lösung. In einigen Projekten muss über die Daten in den Tabellen gesucht werden. Für kleinere Projekte mit kleinen Datenbeständen kommt man meist mit dem T-SQL Bordmittel LIKE aus. Ein Projekt hat

Wednesday, February 10, 2010 8:58 AM by Simon Sabin UK SQL Consultant's Blog

# Microsoft ditches Unix in the search market

Wednesday, February 10, 2010 9:42 AM by SimonS Blog on SQL Server Stuff

# Microsoft ditches Unix in the search market

Search is one of my interests, where ever you go, whatever business you are in search is the like a holy

# Microsoft SQL Server 2008 – Interated Fulltext Service ist cool | Marco Scheel aka GeekDotNet

Pingback from  Microsoft SQL Server 2008 – Interated Fulltext Service ist cool | Marco Scheel aka GeekDotNet