[Mediawiki-l] Search for similar or alternative words in MW?

rainman at sbb.co.yu rainman at sbb.co.yu
Tue Jul 24 11:23:20 UTC 2007


I think there are two ways to do this. Easy-hackish, and a harder-better way.

First way is to rewrite queries, basically, you would make your extension,
or modify an existing one to rewrite queries like "Bob Jr" into "Bob (Jr
OR Junior)". So, a simple regexp search/replace will do. However, if you
add a feature like this in existing extension, it might brake in the
future ...

The other way is to use the lucene extension, and add a custom filter that
would inject synonyms during indexing, so that you would index Bob Jr as
Bob Jr Junior (where Jr and Junior would be in the same position in the
text, thus threated as aliases by lucene). You can do this by making your
filter, and then listing it for your language in FilterFactory.java in
init() method.

In any case, I don't think it would slow down search too much, especially
if you don't have a lot of synonyms, and they are not extremely frequent,
but performance-wise the second solution is better.

robert

> I am looking to enhance search in my MW install such that someone
> searching
> for some of the common abbreviations below would find a page with the
> fully
> word in its title.

> E.g. I might have a page called ŒBilly Bob Junior¹ but my users might type
> ŒBilly Bob Jr¹. I would want the Jr to match as well as the other words.

> Is there any manner of allowing this type of search.

> I am starting also to look at Lucene and wondered if anyone has experience
> of the performance differential I might expect.




More information about the MediaWiki-l mailing list