On Fri, 28 Apr 2006 02:29:13 -0700
In article <4451E069.6000501(a)pobox.com>
[Re: [Wikitech-l] Hyper Estraier extension]
Brion Vibber <brion(a)pobox.com> wrote:
Tietew wrote:
[snip]
Look'n'feel is derived from
LuceneSearch.php.
I'd recommend instead using the search plugin system built into 1.5 and later;
see extensions/MWSearch for the Lucene interface for that. It might need a
little more tweaking, but will be smoother to replace things with in future than
the old hacked-up LuceneSearch.php.
I could not customize summary generation with MWSearch.
Hyper Estraier can generates good summary itself.
Class SearchEngine (or SpecialSearch?) should have summary
generator hook or overrides.
Next week I'll be testing out Sphinx
(
http://sphinxsearch.com/) which is a GPL'd
fulltext search engine, which at least according to its authors is faster than
Lucene -- much faster at indexing! They don't however currently have appropriate
tokenizing for CJK presently. We could see about adding that, but I'd certainly
love to compare it to something else like Estraier if it's available.
Hyper Estraier uses N-gram method for CJK language.
Additionaly, it can use "MeCab" Morphological Analyzer
(dictionary-based Japanese tokenizer;
http://mecab.sourceforge.jp/)
for keyword search.
Mr.Hirabayashi, the author of Hyper Estraier, says he supports
and cooperates with me, and Wikipedia.
Hyper Estraier has its own HTTP-based P2P protocol.
We can construct distributed search cluster very easily.
For example, my demo site is composed of two machine, an Apache
with MediaWiki on Linux, and Hyper Estraier search server (node
master) on Windows XP.
== Request ==
I want to test and demo real-time index update from live
Wikipedia. Could you please give me a permittion to access
OAIRepository?
I'll set this up for you this weekend. Make sure you've got a fast connection
and plenty of space. :)
Thank you!
--
[[User:Tietew]] <tietew(a)tietew.net>
Wiki:
http://meta.wikimedia.org/wiki/User:Tietew
PGP/GPG: 26CB 71BB B595 09C4 0153 81C4 773C 963A D51B 8CAA