On Sat, Feb 23, 2008 at 5:21 AM, Samuel Wantman wantman@earthlink.net wrote:
Since we've also been discussing the problems of implementing "Category Intersection", an interim solution could be repopulating parent categories and "hiding" intersection categories. Fully populated parent categories are the norm in some projects like German Wikipedia and they also appear sometimes in English Wikipedia (eg. Category:Operas). I have a proposal posted currently about fully populating "Index" categories at en:Wikipedia talk:Categorization, and it would be much improved if the intersection categories could be hidden. The primary reason we have been deleting intersection categories is because they clutter articles. If they didn't clutter articles, they wouldn't be a problem.
A very reasonable idea, that can be implemented Right Now with little effort on the user level. Important or interesting intersections can be manually populated, or populated in batches with bots. It's not as good as a real solution, but it should work well enough.
Perhaps the non-hidden categories could be expanded with a [+] the same way subcategories are expanded. For example, if someone is listed under "Methodist", clicking on the plus might add the hidden categories "American methodist" or "Methodist presidents".
A very specific feature. I might even call it kind of weird in its narrowness: it assumes a *very* particular use of hidden categories. I think it's okay to have people just click on the general category, and then navigate their way down to intersection categories if they're so inclined.
If making the query were to create a hidden category and automatically categorize all the articles that result from the query, the next time the request is made it could just display the results, just like any other category. There might be a timer that resets (every week?) that would force another query to update the category. This way each intersection query would happen fairly infrequently -- as infrequently as need be to keep from overloading the servers.
Well, this is just caching. We do that anyway (query cache, etc.). It's still not really acceptable enough. Especially because your proposed caching method would require not just a scan of a large chunk of an index, but insertion of potentially thousands or tens of thousands of rows.
There would need to be a naming convention for the automatically generated categories, perhaps using a double colon -- so the intersection of Category:Mozart and Category:Operas would generate Category:Mozart::Operas. . . .
By this point, the feature you describe would be more difficult to implement than just implementing real and properly efficient category intersection with Lucene or something. When you realize that to get your hack working properly, you need to implement so many workarounds than the real feature would be easier, it's time to discard the idea of a hack.