Minty wrote in gmane.science.linguistics.wikipedia.technical:
I plan to be playing with Plucene a bit over the next
couple of months
: one initial avenue of interest is some rough and ready benchmarks on
speed/resource requirements. I was planning to use a local copy of
the wikimedia text as a corpus for this testing.
What I don't want to do is duplicate any existing
work...
look at the "lucene-search" module in CVS
(
http://cvs.sourceforge.net/viewcvs.py/wikipedia/lucene-search/). this is
a (mostly) complete and functional Lucene (Java version) based search
server for MediaWiki. i'm not sure how similar the Java version is to
other versions, but you may be able to port the relevant bits without too
much effort.
an experimental test of this on the live site showed that it was able to
handle our search load on a single 3.0GHz P4 with very minimal CPU usage,
as long as the typo suggestion feature isn't enabled (because that uses
several slow searches to produce the result; it could almost certainly be
reimplemented in a much more efficient manner).
kate.