Dear fellow programmers,
Boolean search has been implemented and committed to CVS. You can now use
the keywords "and", "or" and "not" in your queries such as
"Harry and not
Potter". The and operator is implicit so you also omit it as in "Harry not
Potter". So the query "group theory" is interpreted as "group AND
The priority of the "not" is the highest, then the "and" and finally
"or". You can use brackets to alter this, for example as in "Harry and (
Potter or Sally )".
A few problems remain:
1. The index indexes only words with more than three characters. So searches
for small words will always give empty result. This means that a query like
"World war" which is interpreted as "World AND war" give always an
result. I've made this a syntax error to warn the innocent user. However,
the minimun index word size can be changed to 2, but we would have to
recompile MySQL for that and rebuild the indexes.
2. The index has a fixed list of stopwords that also always give an empty
query result. This can also be changed in the source code. The MySQL
matching algorithm also takes into account that certain words are more
critical than others. While this more a feauture than a bug, it also means
that searches for very common words actually result in a very small number
of pages, or even none at all. Als this behavior can be changed by setting
some constants in the source and recompiling.
3. If you have a complex boolean query the search box seems a bit small. As
this is a lay-out matter I leave this to Magnus to change. :-)
Finally, I noticed that the search performs better if the title contains no
"_"s because the index considers "Larry_Sanger" as one word that does
match very well with either "Larry" or "Sanger". To remedy this I
change the code a bit so that new titles are always stored in the indexed
column without underscore's. However, for the old pages you still need to
update this by hand. I have included the statement in 'updSchema.sql'.
-- Jan Hidders
Wikitech-l mailing list