Boolean searching would be cool. If not that, then at least "search within results."
Larry
Dear fellow programmers,
Boolean search has been implemented and committed to CVS. You can now use the keywords "and", "or" and "not" in your queries such as "Harry and not Potter". The and operator is implicit so you also omit it as in "Harry not Potter". So the query "group theory" is interpreted as "group AND theory". The priority of the "not" is the highest, then the "and" and finally the "or". You can use brackets to alter this, for example as in "Harry and ( Potter or Sally )".
A few problems remain: 1. The index indexes only words with more than three characters. So searches for small words will always give empty result. This means that a query like "World war" which is interpreted as "World AND war" give always an empty result. I've made this a syntax error to warn the innocent user. However, the minimun index word size can be changed to 2, but we would have to recompile MySQL for that and rebuild the indexes. 2. The index has a fixed list of stopwords that also always give an empty query result. This can also be changed in the source code. The MySQL matching algorithm also takes into account that certain words are more critical than others. While this more a feauture than a bug, it also means that searches for very common words actually result in a very small number of pages, or even none at all. Als this behavior can be changed by setting some constants in the source and recompiling. 3. If you have a complex boolean query the search box seems a bit small. As this is a lay-out matter I leave this to Magnus to change. :-)
Finally, I noticed that the search performs better if the title contains no "_"s because the index considers "Larry_Sanger" as one word that does not match very well with either "Larry" or "Sanger". To remedy this I have change the code a bit so that new titles are always stored in the indexed column without underscore's. However, for the old pages you still need to update this by hand. I have included the statement in 'updSchema.sql'.
Enjoy,
-- Jan Hidders
Wow, thanks Jan!
Larry
On Sun, 17 Feb 2002, Jan Hidders wrote:
Dear fellow programmers,
Boolean search has been implemented and committed to CVS. You can now use the keywords "and", "or" and "not" in your queries such as "Harry and not Potter". The and operator is implicit so you also omit it as in "Harry not Potter". So the query "group theory" is interpreted as "group AND theory". The priority of the "not" is the highest, then the "and" and finally the "or". You can use brackets to alter this, for example as in "Harry and ( Potter or Sally )".
A few problems remain:
- The index indexes only words with more than three characters. So searches
for small words will always give empty result. This means that a query like "World war" which is interpreted as "World AND war" give always an empty result. I've made this a syntax error to warn the innocent user. However, the minimun index word size can be changed to 2, but we would have to recompile MySQL for that and rebuild the indexes. 2. The index has a fixed list of stopwords that also always give an empty query result. This can also be changed in the source code. The MySQL matching algorithm also takes into account that certain words are more critical than others. While this more a feauture than a bug, it also means that searches for very common words actually result in a very small number of pages, or even none at all. Als this behavior can be changed by setting some constants in the source and recompiling. 3. If you have a complex boolean query the search box seems a bit small. As this is a lay-out matter I leave this to Magnus to change. :-)
Finally, I noticed that the search performs better if the title contains no "_"s because the index considers "Larry_Sanger" as one word that does not match very well with either "Larry" or "Sanger". To remedy this I have change the code a bit so that new titles are always stored in the indexed column without underscore's. However, for the old pages you still need to update this by hand. I have included the statement in 'updSchema.sql'.
Enjoy,
-- Jan Hidders
Wikitech-l mailing list Wikitech-l@ross.bomis.com http://ross.bomis.com/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org