Re: [Wikitech-l] lucene search 2.0 test webinterface

22 May 2007

That is exactly how it works in the new engine: only contents is stemmed and
indexed with stemmed/original pairs. What you quoted is the current search
engine output, consult the results from the new engine:
http://ls2.wikimedia.org/search?dbname=enwiki&query=commodity&ns0=1

r.

On 5/22/07, Tian-Jian Barabbas Jiang@Gmail &lt;barabbas(a)gmail.com&gt; wrote:
...

 Hi all,

     Now search results of "commodity" changes:

     * Commodities <http://en.wikipedia.org/wiki/Commodities>
       Relevance: 100.0% - -
     * Commodity <http://en.wikipedia.org/wiki/Commodity>
       Relevance: 95.4% - -
     * Commodate <http://en.wikipedia.org/wiki/Commodate>
       Relevance: 94.7% - -
     * Commode <http://en.wikipedia.org/wiki/Commode>
       Relevance: 94.6% - -

 I suggest that you may want to index "Title" with StandardAnalyzer and
 "Content" with SnowballAnalyzer,
 since the title field of Wikipedia is almost all named entities that
 should not be modified at all.
 IMHO, to have a mixture of original words and stemmed forms is a good
 heuristic rule though,
 but it is only suitable for content field.

     Sincerely,
 /Mike "b6s" Jiang/

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] lucene search 2.0 test webinterface