Search engine - Wikitech-l

29 Jan 2003


      Our search engine desperately needs retooling. If there's no objection
from those in the know, I'd like to migrate us to MySQL 4. The fulltext
search in 4 has boolean capabilities built right in, meaning we could
remove our hackish and buggy parser, and wouldn't need to stack so many
MATCHes together in a query when some poor sap types in "chemical
composition of the earth's atmosphere oxygen nitrogen" or something.
(Our search queries are also frequently *dog slow*. This is exacerbated
because, being a myisam table, it locks when someone tries to write it
and another read is pending. I don't _think_ this lock virulently
spreads to other tables joined with it, but it's annoying anyway.)
Other things to think about:
* Stopwords. Can we just get rid of the damn stopwords and search
anything?
* "Title results" vs "Text results" - this two-prong approach is, I
think, rather confusing. We could have a single search index field with
the title text weighted more heavily (by repetition?), and just give a
single set of results.
* Text extracts: these show the raw wikicode, and often include language
links, HTML code, etc. Yuck! If we can strip these, that might be good.
* Character entities: should be folded to their raw equivalents in the
search index, so searching a page containing "Schrödinger" and one
containing "Schr&ouml;dinger" gives identical results.
* 'Power search' is perhaps a little confusing, and there's currently no
way to get to it short of doing two searches.
* 'Search' and 'go' buttons are not clearly demarcated; several people
have noted confusion.  Better labelling or better arrangement is needed.
* Redirects. We generally want to filter out redirects that seem
duplicative of other things already listed, but *must* show them for
alternate names. Clearer labeling of redirects would help as well.
-- brion vibber (brion @ pobox.com)