Brion Vibber wrote:
Lucene cannot be included as it is a program, and requires either
precomputed binaries (every operating system needs different ones) or
the
source code and a compiler.
Well, it's Java at least, so only one binary needed on most systems. :)
Still, that's an external dependency, and running a Java daemon is not an out-of-the-box task on your standard LAMP server.
I would like to see some improvements to the built-in search, though; interface improvements and some better category tagging would help a lot.
I've been playing with zend_search_lucene (for category intersections... still) and it might make sense to think about writing an extension in that for general search. I've been avoiding the Java version of lucene for exactly the reason mentioned ("... running a Java daemon is not an out-of-the-box task...").
The search on my dataset (4 million records - just categories and page ids) is not exactly impressive (~10 seconds for worst case scenario of "Living_People +some other big category") and Luke gives much faster results, but I've been emailing off and on with Alexander at Zend... he says this is important to Zend and they're putting effort into improving the code. I'm also trying clucene - which should be pretty easy to just compile and run, if I knew what I was doing.
So, to make a long story short - I think there are options for lucene based but non-java search. If anyone else is working in these areas, I'd love to hear from them.
Aerik
From my experience, the biggest issue with lucene is index maintenance.
Doing both searching and indexing on the same index does not scale up very well, so you want them separate ... Further, if you want incremental indexing you need additional smartness, e.g. in case of category intersections, what if a category is in a template, and template changes? many pages can change categories with one edit..
Anyway, setting up LuceneSearch extension in the simplest scenario is not very hard, you need a jar file, to tune some paths/dbnames in config files, and setup a cronjob for index rebuild. I guess one could write a simple install script for that ...
Robert
I've been playing with zend_search_lucene (for category intersections...
still) and it might make sense to think about writing an extension in that for general search. I've been avoiding the Java version of lucene for exactly the reason mentioned ("... running a Java daemon is not an out-of-the-box task...").
The search on my dataset (4 million records - just categories and page ids) is not exactly impressive (~10 seconds for worst case scenario of "Living_People +some other big category") and Luke gives much faster results, but I've been emailing off and on with Alexander at Zend... he says this is important to Zend and they're putting effort into improving the code. I'm also trying clucene - which should be pretty easy to just compile and run, if I knew what I was doing.
So, to make a long story short - I think there are options for lucene based but non-java search. If anyone else is working in these areas, I'd love to hear from them.
Aerik
http://www.wikidweb.com - the Wiki Directory of the Web _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org