Simetrical wrote:
See http://bugs.wikimedia.org/show_bug.cgi?id=5244 and the various things duped to it. I'm pretty sure performance would be a major issue here; for instance, finding the first 200 pages in a category is limited to iterating over 200 members of the category, and likewise for all other operations currently supported by categories (as well as unions), but finding the first 200 pages in the intersection of two categories has no upper bound on the number of iterations required: you have to go through every page in each category in the event that they have fewer than 200 shared pages and neither is a subset of the other.
Has anyone written code that can handle this efficiently? Is such code even possible?
I've floated the possibility of using a full-text search engine (say... Lucene) for this, since that's basically what it does.
Use an appropriate indexer for your tag words, and presto.
The tricky part might be getting arbitrary sorting out of it, though. :) Easy for small results, but perhaps hard for large ones.
-- brion vibber (brion @ pobox.com)