Re: [Wikitech-l] lucene search 2.0 test webinterface

22 May 2007


      Mohamed Magdy said:
...
Robert Stojnic wrote:
...
...
Sounds nice to limit the search in certain category.. nice work! but
what does this mean? "and stemmed words are penalized."
The stemming issue is reported in bug 2511 [*]. The bug is caused by the
indexer not indexing the original word, but only it's root (i.e. stemmed
word). Now both are indexed, and original words are preferred, i.e. have
larger scores.
I may be wrong .. but isn't it right that before the program could get 
the root of the word it have to know it? i mean.. it should have a big 
list of words and its roots? and that is not for english only..you have 
to have lists for each language? or where else the program will strip 
the words
Porter Stemmer only cuts inflected forms and known suffices, it does not 
convert
words back to lemmas.
This heuristic algorithm saves spaces of dictionaries, and experiences
show that it is usually good enough for English, at least.
Certainly it harmed if Wikipedia needs accurate results.
Cheers,
/Mike "b6s" Jiang/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] lucene search 2.0 test webinterface