That is exactly how it works in the new engine: only contents is stemmed and indexed with stemmed/original pairs. What you quoted is the current search engine output, consult the results from the new engine: http://ls2.wikimedia.org/search?dbname=enwiki&query=commodity&ns0=1
r.
On 5/22/07, Tian-Jian Barabbas Jiang@Gmail barabbas@gmail.com wrote:
Hi all,
Now search results of "commodity" changes: * Commodities <http://en.wikipedia.org/wiki/Commodities> Relevance: 100.0% - - * Commodity <http://en.wikipedia.org/wiki/Commodity> Relevance: 95.4% - - * Commodate <http://en.wikipedia.org/wiki/Commodate> Relevance: 94.7% - - * Commode <http://en.wikipedia.org/wiki/Commode> Relevance: 94.6% - -
I suggest that you may want to index "Title" with StandardAnalyzer and "Content" with SnowballAnalyzer, since the title field of Wikipedia is almost all named entities that should not be modified at all. IMHO, to have a mixture of original words and stemmed forms is a good heuristic rule though, but it is only suitable for content field.
Sincerely,
/Mike "b6s" Jiang/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l