This is working really nice. Are you documenting the implementation <G>
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy 206-770-5585 davesigafoos@sanmar.com
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Robert Stojnic Sent: Tuesday, May 22, 2007 9:00 To: wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] lucene search 2.0 test webinterface
Hi all,
I'm working on the internal lucene search engine. I've setup a webinterface (with the kind help of Tim :) for the new engine, visit it here:
Most changes are in the internals (i.e. making searching/indexing distributed, incremental updates...), but I also tried to improve the scoring, and added some new search syntax, and enabled stemming for another ten or so major languages. Highlights: - prefix searches. E.g. entering help:images in the search box will search only the help namespace - search categories. You can limit search by category. e.g. clarinet incategory:"woodwind instruments" - improved scoring. Default lucene scoring favors short articles, I tried to make scoring as relevant to wikipedia as possible. Good test is entering "commodity" into search. Top two articles have almost the same score, first one: Commodity (Marxism) is a long article about usage of the word in Marxism, and other: Commodity is an article that is much shorter but whose title fits more accurately.
Test index is based on latest dumps for 15 largest wikis, with updates from last 4-5 days.
Any feedback will be appreciated :)
Robert
http://en.wikipedia.org/wiki/Commodity_%28Marxism%29 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l