Anthony DiPierro wrote:
On 6/3/06, Roger Luethi collector@hellgate.ch wrote:
On Sat, 03 Jun 2006 19:54:27 +0200, Steve Bennett wrote:
I'm probably not the only one who envisages all the wonderful things that could be done with this massive collection of information that is Wikipedia, *if only* we could do something clever with the categories. And then you realise that you can't really do anything clever because "category" has all sorts of different meanings to different people.
Agreed. Still: can you give some specific examples of wonderful things that could be done but are not possible now? That would tell us what problem you are trying to solve.
I've personally run into this when trying to automatically create, for example, a list of all Wikipedia articles on people. You can't just start at [[category:people]] and work your way down, because you wind up going to [[Category:Women]] (fine, all women are people) then [[Category:Feminine hygene]] (bad).
A high level category like people does not need to have direct elements. To be simplistic about it, it would do fine with only two sub-categories, men and women. An element of a sub-category is an element of the superset category.
Categories based on such intersections of attributes are conceptually bad. Look at the categories for an article like [[Marie Curie]]: She's French three times, female four times, Polish four times (not counting "Natives of Warsaw"), etc. Why not create [[Category:Polish women who were born in 1867 and died in 1934 and won a Nobel Prize in Chemistry and in Physics]]?
Because there would only be one person in that category.
Such a category would be theoretically acceptable but totally impractical. There is an element of art to the design of category hierarchies. A category that's too narrow (like your example) is unfindable; you simply never know which ones exist. At the other extreme, if the category is too broad it becomes more difficult to find things within it. In Wiktionary people have established [[Category:English nouns]] which now has numerous elements, but what user would ever look there to find something? The purpose of categories is to help the passive user to find things. It requires some idea of which Googling strategies work and which don't, and how to modify a strategy which initially doesn't work. Just think of what works when you are searching for something.
In my mind a category should not have more than 200 direct elements, this being the number of items that will appear on a single page by default when we ask for a category to be listed. Anything longer should be subdivided. Even so, a person should have the option to have an "include sub-categories" to a determinable level when listing the contents of a higher level category.
If we don't have a term for (or an article about) it, there probably shouldn't be a category for it, either (I'm sure a determined mind could come up with an exception).
If the category system could effectively build these intersection categories on the fly, I'd agree. But the category system can't currently do that. (And it's been around a reasonably long time, with that as an obvious flaw, and no one has fixed it.)
I suggested something of the sort before categories were implemented, but more from the searching end. The real problem is with the search function, which is remarkably unsophisticated for a project the size of Wikipedia.
Attributes: The category exists to denote some very specific small detail of a subject, such that it would be conceivable to have dozens or more such categories on an article. Examples: 1943 deaths, Living persons, Winners of Nobel Peace Prize, etc. These tend to hierarchies that start strict then end up fuzzy. Eg, 1943 deaths is only in 1943 and "1940s deaths", and these have parent categories of "1940s","Years" and so forth, eventually ending up in "History", whereupon things become chaos.
There is no way to make hierarchies not suck, especially if you have to maintain them manually (as we do now). Don't try to impose hierarchies unless they emerge quite naturally from the subject.
I made a proposal. All subcategories of attributes must be a subset of the parent attribute. Seems like a perfectly reasonable way to make hierarchies not suck.
It's an idea that I have tried to implement for some time at Wiktionary. The difficulty with such hierarchies is that they require people to think logically, and to be able to trace a path back to a single top level hierarchy.
Ec