Understanding Full Text Searching Compared with LIKE
One of the most common questions that comes up on the
newsgroups is around searching for terms containing punctuation. The questions
are normally something like
"I'm trying to use Full Text to search my product catalog, when someone
enters a product code the results I get back are often wrong. Why is that?"
This is normally folllwed by a response
"What format are your product codes?"
With an answer of,
"Something like A-BXDRF/1234"
And that is when the problem is highlighted. the "A-BXDRF/1234" in full text
world is more like "A BXDRF 1234" let me explain
Full text works with terms (words), it has an index of these terms that have
been extracted from the data in the column indexed. Full text uses a word
breaker to achieve this, these have rules built in to them that define how a
block of text is broken down into terms. The word breaker is language specific
as they are different rules for breaking up a block of text for different
languages. Some languages don't have word breakers and use a neutral word
breaker whilst others inherit word breakers from a master language i.e. French
Canadian inherits from French. Word breaking applies to both indexing of
documents as well your search text.
At a simple level words are broken up at neutral characters like spaces
and punctuation. This should explain the breaking up of the product code into
the individual terms. In the example the killer aspect is that one of the terms
is a single letter. By default full text treats individual letters as noise
words. This means they are not stored in the index. To resolve this you can
remove all the single letters (and anything else that you might want to search
on) from the noise word file and then repopulate your index.
Whilst this will result in a better set of results it may still not be 100%
accurate and thats because you will be searching for a broken down set of terms.
and not a precise term.
They other word of warning is that in removing words from the noise list will
result in your full text index being much larger.
Many people will also reccommend the use of LIKE to solve this problem.
Whilst this is a solution, be aware that SQL will have to scan every record to
perform the search rather than use an index, if you text column is large i.e. a
product description of 4000 characters then this will be a slow process. There
is performance trade off between having a large full text index or using
LIKE, generally full text will perform better.
If you must have 100% accurate searches then I would suggest holding your
product code in a seperate field and performing a search on that field, in that
situation a LIKE search may perform adequately.
Which ever option you choose make sure that you thoroughly test your solution
and also test for concurrency.
-