Instead of Lucene (or clones of this), we could consider to use JODA
[1]. Jochen has written JODA especially for Wikipedia and Mediawiki
purposes and has published this several times in this mailing list.
His indexer can be visited live on Neue-Ruhr-Zeitung as an indexer for
Wikipedia data, see [2]
I also propose to set up a metawiki page
http://meta.wikipedia.org/wiki/FulltextSearchEngines to discuss all
aspects and variants separated from this mailinglist to keep the list
messages "KISS" i.e. keep it short and simple.
Tom
[1]
http://sourceforge.net/projects/ioda/
[2]
http://wikipedia.rhein-zeitung.de/index.php/Hauptseite (this page
demonstrate only the indexer and is not intended as a mirror for wikipedia)