On Fri, 28 Apr 2006 02:29:13 -0700 In article 4451E069.6000501@pobox.com [Re: [Wikitech-l] Hyper Estraier extension] Brion Vibber brion@pobox.com wrote:
Tietew wrote: [snip]
Look'n'feel is derived from LuceneSearch.php.
I'd recommend instead using the search plugin system built into 1.5 and later; see extensions/MWSearch for the Lucene interface for that. It might need a little more tweaking, but will be smoother to replace things with in future than the old hacked-up LuceneSearch.php.
I could not customize summary generation with MWSearch. Hyper Estraier can generates good summary itself.
Class SearchEngine (or SpecialSearch?) should have summary generator hook or overrides.
Next week I'll be testing out Sphinx (http://sphinxsearch.com/) which is a GPL'd fulltext search engine, which at least according to its authors is faster than Lucene -- much faster at indexing! They don't however currently have appropriate tokenizing for CJK presently. We could see about adding that, but I'd certainly love to compare it to something else like Estraier if it's available.
Hyper Estraier uses N-gram method for CJK language. Additionaly, it can use "MeCab" Morphological Analyzer (dictionary-based Japanese tokenizer; http://mecab.sourceforge.jp/) for keyword search.
Mr.Hirabayashi, the author of Hyper Estraier, says he supports and cooperates with me, and Wikipedia.
Hyper Estraier has its own HTTP-based P2P protocol. We can construct distributed search cluster very easily.
For example, my demo site is composed of two machine, an Apache with MediaWiki on Linux, and Hyper Estraier search server (node master) on Windows XP.
== Request == I want to test and demo real-time index update from live Wikipedia. Could you please give me a permittion to access OAIRepository?
I'll set this up for you this weekend. Make sure you've got a fast connection and plenty of space. :)
Thank you!