On 11/8/06, Tim Starling tstarling@wikimedia.org wrote:
We've now dedicated a server to search index updates. I've got them running in two threads, one for enwiki and one for everything else. Each one is a loop, we should get a complete index update once every 30 hours or so.
Would now be a good time to ask questions about what the search index algorithm is and possible improvements to it? It works okay, but sometimes the bleedingly obvious match isn't even on the first page.
Example: Search for "tchaikovsky's piano concerto" (before I put the redirect in). The correct match "Piano Concerto No. 1 (Tchaikovsky)" is miles down the page, even though it has the three search terms (minus an 's) in the title. Search still seems to miss pages where the title is the *exact* search term, too.
"Sounds-like" searches would be nice too - very often you look up stuff that you don't know how to spell, but may get the pronunciation close.
Accentless searches would be great, too. It gets really tedious making redirects like "Spisska Kapitula -> Spišská Kapitula".
And of course, resolving this whole distinction between "searches" and "gotos" would be nice :)
Steve