On Sat, Feb 23, 2008, Tim Starling tstarling@wikimedia.org wrote:
But it seems to me, if you look at data storage software already in use, Lucene is much better suited for computing intersections than MySQL.
Tim, aren't you kind of the point guy for the lucene search? Would you be up for setting up a categories index? I don't know how the update works (I think, from what I've read, that it does a big index regeneration on some kind of schedule, but I really don't know).
I think it could be implemented as either a separate index, or as a new field on the current index.
I'd be happy to help, but I'm totally unfamiliar with the code, and don't really want to set up Java on my server for testing... I've created lucene indexes on the categories table before, but not in any way that even approaches a production type environment. Maybe that still leaves some opportunity to help though.
Best Regards, Aerik
Tim, aren't you kind of the point guy for the lucene search? Would you be up for setting up a categories index? I don't know how the update works (I think, from what I've read, that it does a big index regeneration on some kind of schedule, but I really don't know).
We use incremental updates nowadays, since the complete rebuilds where unstable and would fail or hang from time to time... The dump-then-rebuild category intersection index would likely suffer from same problems.. Keeping the index incrementally updated might not be as straightforward, as it would need be integrated with job queue in some way.
My guess is that trying to make category intersections through any other external service would suffer from same problems, and that making interfaces for lucene or something else should be easy enough with a stable update scheme figured out.
r.
On Sat, Feb 23, 2008 at 3:55 PM, Aerik Sylvan aerik@thesylvans.com wrote:
Tim, aren't you kind of the point guy for the lucene search?
Robert Stojnic (rainman) is the one who maintains our Lucene plugin, and as far as I know the one who wrote it in the first place.
Hello,
Robert Stojnic (rainman) is the one who maintains our Lucene plugin, and as far as I know the one who wrote it in the first place.
Well, original Lucene work was done by River (ex-Kate) - back when we all were in Berlin (2004), Kate did deploy it and did cause some surprise :)
BR,
wikitech-l@lists.wikimedia.org