On 11/8/06, Tim Starling <tstarling(a)wikimedia.org> wrote:
We've now dedicated a server to search index
updates. I've got them running
in two threads, one for enwiki and one for everything else. Each one is a
loop, we should get a complete index update once every 30 hours or so.
Would now be a good time to ask questions about what the search index
algorithm is and possible improvements to it? It works okay, but
sometimes the bleedingly obvious match isn't even on the first page.
Example: Search for "tchaikovsky's piano concerto" (before I put the
redirect in). The correct match "Piano Concerto No. 1 (Tchaikovsky)"
is miles down the page, even though it has the three search terms
(minus an 's) in the title. Search still seems to miss pages where the
title is the *exact* search term, too.
"Sounds-like" searches would be nice too - very often you look up
stuff that you don't know how to spell, but may get the pronunciation
close.
Accentless searches would be great, too. It gets really tedious making
redirects like "Spisska Kapitula -> Spišská Kapitula".
And of course, resolving this whole distinction between "searches" and
"gotos" would be nice :)
Steve