On Sat, 03 Jun 2006 19:54:27 +0200, Steve Bennett wrote:
I'm probably not the only one who envisages all the wonderful things that could be done with this massive collection of information that is Wikipedia, *if only* we could do something clever with the categories. And then you realise that you can't really do anything clever because "category" has all sorts of different meanings to different people.
Agreed. Still: can you give some specific examples of wonderful things that could be done but are not possible now? That would tell us what problem you are trying to solve.
So far I have identified four rough types of categories. I'll invent the notion a(X) to mean that article X is in category a. a(b(X)) means that a is a subcategory of b, and X is in b.
ITYM "b is a subcategory of a".
Taxonomies: Tend to end in "s" and satisfy the rule that "If a(X) then X is an a") is a logical sentence. Tend to form strict hierarchies, where if a(X) and b(a), then it's perfectly natural and normal that b(a(X)). Eg, Bridges in France is a subcat of Bridges, and every entry in "bridges in France" is definitely a Bridge. It's rare for an article to be in more than two taxonomic categories at once.
"Bridges in France" may not be the best example. "Bridges in France" is just an intersection of two attributes ("in France", "Bridges"), and their relative position in a hierarchy is undefined. Hence more than one hierarchy: You can drill down "France" ... "Buildings and structurces in France" or "Bridges" ... "Bridges by country".
Compare with taxons in the classification of species: an actual hierarchy, and only one path from the top down to any species -- there you are dividing into subsets (and intersections make no sense).
Categories based on such intersections of attributes are conceptually bad. Look at the categories for an article like [[Marie Curie]]: She's French three times, female four times, Polish four times (not counting "Natives of Warsaw"), etc. Why not create [[Category:Polish women who were born in 1867 and died in 1934 and won a Nobel Prize in Chemistry and in Physics]]?
If we don't have a term for (or an article about) it, there probably shouldn't be a category for it, either (I'm sure a determined mind could come up with an exception).
Themes: Tend not to be plurals, and tend not to form strict hierarchies. Often it is the case that b looks like it belongs in a, but then a(b(X)) is nonsense for certain X. Eg, Paris might be in European cities, and the film Amelie might be in Paris, but it's silly to say that Amelie is in European cities. (or many worse examples)
Well yes, Amelie _is_ related to European cities. It is relevant for a list of movies that are set in European cities. The real problem is that the initial relation is entirely unqualified: Amelie is neither a part nor a member of Paris.
You could conceivably create a category "set in Paris" for the film and have that be a subcategory of "set in European cities". Problem is, you need to propagate that modifier backwards all the way to the top or you will have the same situation you described.
The best solution I've seen is qualifying relations (something like the [[Semantic MediaWiki]]). For instance: Amelie is set in [[set in::Paris]].
Attributes: The category exists to denote some very specific small detail of a subject, such that it would be conceivable to have dozens or more such categories on an article. Examples: 1943 deaths, Living persons, Winners of Nobel Peace Prize, etc. These tend to hierarchies that start strict then end up fuzzy. Eg, 1943 deaths is only in 1943 and "1940s deaths", and these have parent categories of "1940s","Years" and so forth, eventually ending up in "History", whereupon things become chaos.
There is no way to make hierarchies not suck, especially if you have to maintain them manually (as we do now). Don't try to impose hierarchies unless they emerge quite naturally from the subject.
Meta-attributes: These are categories about *articles* rather than article subjects. The most common examples are stubs ("France geography stubs"), sources ("1911 Encyclopaedia Britannica") and disputes of various kinds ("Articles lacking sources").
Actually, "France geography stubs" contains two attributes (France, geography). Only the "stub" part is not about the subject. But yeah, it's a problem.
Another one that you didn't mention is articles that merge several concepts into one: This happens for instance if a biography is merged with the thing that made the person notable. You get articles that are in people and object categories at the same time (e.g. programmers, software).
To me, these types of categories are all fairly incompatible, and really get in the way of using categories to do anything useful. It's pointless trying to draw tree structures when you have attributes and meta-attributes involved, for example.
So the problem you are trying to solve is drawing tree structures? I'm afraid your problem may not be shortcomings in WP, but the real world.
So my questions are these: *Can anyone think of other types of categories I might have missed?
Basically, you have identified: 1) is an intersection of [Bridges in France / in France & Bridges] 2) is a subset of [Bridges in France / Bridges] 3) is a member of [Paris / European Cities] and all your attribute examples 4) is related to (or more specifically: is set in) [Amelie (movie) / Paris] 5) information about the article
1) can be computed and shouldn't exist as categories. I'm not sure whether we care about the difference between 2) and 3). 5) you can quite easily deal with using namespaces (depending on the problem, of course). The meat is in 4): You can add any number of named relations there, and most of the current ugliness is there.
*How could Wikipedia be better if this general problem was addressed?
What was the problem again?
Anyhow, I guess my main point is that hierarchies are overrated. They are most useful when you don't have a computer to sort things out for you.
Roger