Hi all,
I'm working on the internal lucene search engine. I've setup a webinterface (with the kind help of Tim :) for the new engine, visit it here:
Most changes are in the internals (i.e. making searching/indexing distributed, incremental updates...), but I also tried to improve the scoring, and added some new search syntax, and enabled stemming for another ten or so major languages. Highlights: - prefix searches. E.g. entering help:images in the search box will search only the help namespace - search categories. You can limit search by category. e.g. clarinet incategory:"woodwind instruments" - improved scoring. Default lucene scoring favors short articles, I tried to make scoring as relevant to wikipedia as possible. Good test is entering "commodity" into search. Top two articles have almost the same score, first one: Commodity (Marxism) is a long article about usage of the word in Marxism, and other: Commodity is an article that is much shorter but whose title fits more accurately.
Test index is based on latest dumps for 15 largest wikis, with updates from last 4-5 days.
Any feedback will be appreciated :)
Robert