On Fri, Aug 16, 2002 at 10:53:17PM +0200, Axel Boldt wrote:
Andre writes:
Could the search feature be changed such that
common words, rather
than blocking the whole search, are removed from it
I would like that very much. I just searched for "leap second" and
found nothing, because "second" is a stop word...
The internal mysql search engine already does the right thing and
omits stop words silently, but we are using it separately for every
word, which causes the problem.
The problem with that was that MySQL's scoring was based more on OR'ing the
search words. So a search for "world war" would give you a result that is
similar to what you now get if you do "world OR war".
It surprises me a bit that "second" is a stopword, btw. That reminds me. As
far as I could tell there is only an English stopword list in MySQL. So
what do we do for the non-English Wikipedias? If we want to give them each
their own stopword list then we need to recompile MySQL for each of them and
given the all their own MySQL server. Or are we going to have one server
with an empty stopword list and see that doesn't let the fulltext index
explode in size?
Another thing is that if a word appears in more than 50% of the documents
then its search result will also be empty. We cannot filter those out in the
search text because we don't know which words these are.
Perhaps its time to think about rolling our own fulltext indexing
mechanism?
-- Jan Hidders