I have committed an unfinished and experimental, but basically working external search daemon and extension to use it, based on the Lucene search engine [1].
This enables searching to be moved out of MySQL and handled separately, thus not bogging down the database server with search queries and updates, as well as giving us the full power of a complete search engine, rather than MySQL's rather limited one. I suspect it will also be a bit faster; indexing en.wikibooks on a 1.1GHz Athlon with IDE disk takes 180 seconds (index size on disk: 53MB).
Other interesting possibilities include much shorter delays for search updates, incremental updates, various new ways to search other than title/body content, and so on.
To use it, check out the `lucene-search' module from CVS, and see the README.txt file for more information. Note that it ONLY works with 1.5. I may backport it to 1.4 at some future point.
Kate.
Kate Turner schrieb:
I have committed an unfinished and experimental, but basically working external search daemon and extension to use it, based on the Lucene search engine [1].
Whatever happened to the already wikipedia-adopted and specifically for us GPL-released high-throughput search engine from Jochen Magnus (not related;-)?
It is currently at http://ioda.sourceforge.net/ See also the old thread on this list
http://mail.wikipedia.org/pipermail/wikitech-l/2004-September/024975.html
Magnus (Manske)
On Wed, 22 Dec 2004 22:23:47 +0100, Magnus Manske magnus.manske@web.de wrote:
Kate Turner schrieb:
I have committed an unfinished and experimental, but basically working external search daemon and extension to use it, based on the Lucene search engine [1].
Whatever happened to the already wikipedia-adopted and specifically for us GPL-released high-throughput search engine from Jochen Magnus (not related;-)?
Well, according to their SF page, it doesn't support UTF-8, so it's not yet really useful as a general replacement for MediaWiki-search. However, the MediaWiki part of my search system can be used with any external search engine, including Ioda (as long as it supports queries via a socket) - the Lucene search daemon could just be considered as a reference implementation. Maybe I should rename LuceneSearch to ExternalSearch ;) In theory it could even be used with the MySQL search system if people wanted the new interface features without needing to run an external daemon...
(Additionally, the MWDaemon implementing Lucene is under a more-free license, MIT, and Lucene itself is under the Apache Software License, so it is probably more suitable for those wishing to re-use it; and for those that don't like Java, there's always CLucene or whatever it's called... not that I think it would be worth the effort myself).
As for progress, the LuceneSearch extension (minus the find-as-you-type part, which is new and likely to contain bugs) should be pretty usable now; the only major component left to implement is on-the-fly updating of pages in the search daemon. Stare in awe and wonder at fancy new features like 'did you mean?' spelling corrections and close title match suggestions ;)
Kate.
wikitech-l@lists.wikimedia.org