On 6/4/06, Roger Luethi collector@hellgate.ch wrote:
On Sat, 03 Jun 2006 17:27:59 -0400, Anthony DiPierro wrote:
Categories based on such intersections of attributes are conceptually bad. Look at the categories for an article like [[Marie Curie]]: She's French three times, female four times, Polish four times (not counting "Natives of Warsaw"), etc. Why not create [[Category:Polish women who were born in 1867 and died in 1934 and won a Nobel Prize in Chemistry and in Physics]]?
Because there would only be one person in that category.
That's why nobody made it, but not why it shouldn't be done.
I'd say it's both. There shouldn't be categories with only one article in them. IMO that's just common sense.
In the current system categories should have a fair number of articles in them. If there are too many, they should be broken up. If there are too few, they should be combined. There isn't a crystal clear line what constitutes too many and what constitutes too few, but a category with only one article in it clearly has too few.
The problem of categories having too many articles in them wouldn't really be a problem if the software allowed you to automatically compute category intersections. But the software doesn't do this, so people make do with what they've got.
It would be nigh impossible to do well because once we start combining attributes to create new categories, we are looking at maintaining links between articles and an exploding number of subcategories.
But even if we maintained a complete and up-to-date system of subcats, we'd still make it hard for people to find articles using categories. For some fairly sensible reasons, the rule is to include articles only to the subcategory, but not to the parent. There is no way to list articles based on a subset of criteria (the articles in subcategories are effectively hidden on separate pages which is only helpful if you know which one to pick).
If the category system could effectively build these intersection categories on the fly, I'd agree. But the category system can't currently do that. (And it's been around a reasonably long time, with that as an obvious flaw, and no one has fixed it.)
You are right, we can't effectively build these intersection categories on the fly at the moment, but we _could_ automatically create or update such intersection categories if the categories weren't the mess that Steve and you describe. Kind of like the search index.
You're right. And that's what my simple rule that "All subcategories of attributes must be a subset of the parent attribute" is meant to address. If that were the case, it would be possible to automatically recursively descend a parent category to find *all* the articles to which it applies. And then computing the intersection of any two parent categories would be possible. I actually had software which did this, but it doesn't work right because the subcategory rule isn't being followed.
Once the software is written to compute intersections of categories within the Mediawiki software, it would be relatively simple to recategorize the articles into their parent categories, such that no information was lost. The way this would be done is that all articles in a subcategory which had multiple parent attribute categories would be automatically moved into the parent categories. This would be repeated until no such situations continued to exist. The ad-hoc structure could still be kept, but it could be calculated on the fly (along with new types of intersections which could be easily added).
(Now that I do this on an example, I see that this algorithm would probably have to be tweaked to deal with subcatgories of [[Category:Categories by topic]], but that's not too bad.)
Attributes: The category exists to denote some very specific small detail of a subject, such that it would be conceivable to have dozens or more such categories on an article. Examples: 1943 deaths, Living persons, Winners of Nobel Peace Prize, etc. These tend to hierarchies that start strict then end up fuzzy. Eg, 1943 deaths is only in 1943 and "1940s deaths", and these have parent categories of "1940s","Years" and so forth, eventually ending up in "History", whereupon things become chaos.
There is no way to make hierarchies not suck, especially if you have to maintain them manually (as we do now). Don't try to impose hierarchies unless they emerge quite naturally from the subject.
I made a proposal. All subcategories of attributes must be a subset of the parent attribute. Seems like a perfectly reasonable way to make hierarchies not suck.
The devil is in the details.
For instance, how do you connect the districts of Paris to the category Paris? What is a subset of the parent attribute "Paris": "Districts of Paris", or "Quartier Latin", or neither? Does it bother you if the article on a French district is now in a subcategory of "Capitals in Europe"?
[[Category:Paris]] is a theme, not an attribute, so [[Category:Paris]] should not be a subcategory of [[Category:Capitals in Europe]].
Or going back to [[Category:Women]]: You could declare that only articles on instances of women (i.e. biographies) can ever be under that category, and that only sets of such articles can ever be subcategories of the category women. -- You could even create a separate [[Category:Woman]], subcategories like "female reproductive organs" containing articles like uterus. -- But how would you express the undisputed relationship between female human beings and your example [[Category:Feminine hygiene]]? How about [[Category:Women's rights]]? Add an umbrella cat "Somehow related to women" maybe?
Roger
[[Category:Women]] could be a subcategory of [[Category:Woman]]. Making an attribute a subcategory of a theme is allowed, it is the reverse that is not allowed.
In any event, things wouldn't be perfect. Ultimately the best solution would involve fixing the category system itself, a process which should be approached carefully so as to avoid making the same mistakes all over again. The advantage of my proposal to not allow themes as subcategories of attributes is that it can be implemented today, without much disruption, and without modifying any code. Plus, it allows for a relatively straightforward upgrade path when the category system is fixed. The proposal itself is not the fix, it's a temporary workaround.
As an alternative, it would probably be possible to do all of this even without enforcing the subcategory rule. But all purely attribute categories would have to be identified as such. I'll have to think about that.
Anthony