On 9/4/06, Samuel Wantman wantman@earthlink.net wrote:
I'm writing to ask a developer to take a look at a proposal being developed at English Wikipedia at:
http://en.wikipedia.org/wiki/Wikipedia:Category_intersection
There are some long festering basic conflicts in how people use categories that we think could be solved if a category intersection feature were implemented. We are aware that this has been discussed quite a bit in the past, and that code to do category intersections has even been written (DynamicPageList2).
Rick Block and I have been working on a design and policy proposal for a MediaWiki category intersection feature that would allow categories to be defined as the intersection of other categories and would also provide a simple interface for creating "on the fly" intersections. We think this would solve may categorization headaches, while also providing a generally useful new feature. Rick and I are both admins, and software designers who have been very involved in categorization policy for a couple of years. We've shown the proposal to select members of the community who have been very involved with categorization and have gotten a favorable response.
Before we go any further with this proposal we'd like feedback from developers. Do you think what we are proposing is feasible? If so, do you have any suggestions for improving it? If not, what makes it unfeasible, and do you have any ideas about how to make it feasible? Please reply on the talk page:
http://en.wikipedia.org/wiki/Wikipedia_talk:Category_intersection
Thanks very much for your time.
See http://bugs.wikimedia.org/show_bug.cgi?id=5244 and the various things duped to it. I'm pretty sure performance would be a major issue here; for instance, finding the first 200 pages in a category is limited to iterating over 200 members of the category, and likewise for all other operations currently supported by categories (as well as unions), but finding the first 200 pages in the intersection of two categories has no upper bound on the number of iterations required: you have to go through every page in each category in the event that they have fewer than 200 shared pages and neither is a subset of the other.
Has anyone written code that can handle this efficiently? Is such code even possible? Storing and updating expensive and often-used intersections as sort of "virtual" categories would probably be a good idea to begin with, but I'm not exactly knowledgeable on either databases or caching. When I mentioned it on IRC, Domas (database person, works for MySQL) was pessimistic. In addition to what I noted above about an unbounded number of checks, he also pointed out that intersection tends to make categories larger, which also affects performance.
So, as someone who has little personal knowledge of the issue, I'd hazard a guess that if one of the few devs who are knowledgeable enough about efficiency and databases and the MediaWiki schema (most likely Tim, I'd imagine) were willing to write the code, it could maybe be good enough to be acceptable, but otherwise I doubt this will be implemented.