Minty wrote in gmane.science.linguistics.wikipedia.technical:
I plan to be playing with Plucene a bit over the next couple of months : one initial avenue of interest is some rough and ready benchmarks on speed/resource requirements. I was planning to use a local copy of the wikimedia text as a corpus for this testing.
What I don't want to do is duplicate any existing work...
look at the "lucene-search" module in CVS (http://cvs.sourceforge.net/viewcvs.py/wikipedia/lucene-search/). this is a (mostly) complete and functional Lucene (Java version) based search server for MediaWiki. i'm not sure how similar the Java version is to other versions, but you may be able to port the relevant bits without too much effort.
an experimental test of this on the live site showed that it was able to handle our search load on a single 3.0GHz P4 with very minimal CPU usage, as long as the typo suggestion feature isn't enabled (because that uses several slow searches to produce the result; it could almost certainly be reimplemented in a much more efficient manner).
kate.