David Gerard wrote:
On 29/02/2008, Domas Mituzas midom.lists@gmail.com wrote:
That gave performance similar to the MySQL fulltext index *BUT* when I queried the same index with Luke (which is Java), the query was *fast*. Sorry, I can't find the mailing list posts about that.
Zend Lucene is 100x slower than Java Lucene.
We were running a Mono version of Lucene for a while, weren't we? How did that compare?
It was moderately slower than the Java, but on the same order of magnitude for most stuff. Performance differences here were mainly about the Mono VM being a bit slower (at least at the time) and in some cases the regex library being much less efficient (index generation).
The reasons for using Mono at the time over Sun Java or GCJ were:
* Sun Java - fast, but not open source enough * GCJ - fast, open source, but mystery memory leaks * Mono - a bit slower, open source, no mystery memory leaks
Of course over time, mystery memory leaks crept into the system. ;)
Eventually, Sun Java became more and more open to the point where we don't really care anymore (if we get real pissy about it again we could start running an OpenJDK-based VM such as IcedTea), and the guy who picked up development on our Lucene server again preferred to work with the Java version instead of the C# one. (Among other things, this gives you access to the latest Lucene version instead of an older port.)
There's no real reason to choose Mono for this sort of task to start again. Hypothetically if we wanted to ship a Lucene-based tool by default, we could attempt to have backends supporting both the PHP Zend Lucene and the Java one... assuming you can get even vaguely useful performance out of the PHP one. :)
-- brion vibber (brion @ wikimedia.org)