Robert Stojnic wrote:
After much delay, I've completed a new release candidate for our internal search engine. The testing site where you can see it action is same as before [1], with indexes rebuilt from latest dumps.
Here are some highlights:
- spell checking (aka did you mean...)
- ajax prefix suggestions (reimplemented Julien's engine)
- nicer highlighting
- improved scoring
- fuzzy queries, e.g. sarah~ thomson~ will give you all the variations
of both of the words
- suffix wildcards (works on title words only), e.g. *stan will give you
all the -stan countries of central asia - for performance reasons it won't work nicely on huge sets of words
Sweeeet! :)
Search is a bit slowish, especially on enwiki, since I've crammed all of its revision text, spellcheck indexes, search indexes and other stuff on a single host. According to my tests, typical search should be in 150-180ms range (of CPU time), which is much slower than current (25-30ms). Most overhead comes from spell checking and highlighting. I was thinking of trying to use some of the 8-cpu boxes...
Yeah, we might need to dedicate more hardware to handle that.
The ajax suggestions (when properly cached in RAM) are pretty fast (0.2-0.4ms), so we could probably enable it side-wide on search boxes and such. Initially it would be update once a day, but we could cut that down, depending on number of servers and actual number of requests.
Cooooool!
-- brion