This is a mini-essay on a current problem in MediaWikiland, category policy. It's also avaiable at http://en.wikipedia.org/wiki/User:Gracefool/What_is_a_category, but I'd rather have it discussed here first.
'''N.B.''' I'm aware of previous discussions at [[Wikipedia talk:Categorization]]. This essay is a more thorough and defensible treatment of the issue, and it highlights the fallacies of many previous arguments.
==Introduction== What ''is'' a category? No-one knows. There isn't consensus on what a category is (see [[Wikipedia talk:Categorization]]). Is it a hierarchical tree, with all categorizations representing "[[is a]]" relationships? Or is it just a set, a group of related articles?
This is an important question - just look at [[Wikipedia:Categories for deletion]]. Changes to categories have more widespread effects than changes to articles, and have a greater possibly of annoying editors.
I believe that categories are, and should be, sets, not hierarchies:
==Categories are sets== ===Original purpose of categories=== What was the original purpose of the categorization system? Development of a taxonomy of worldy knowledge? I don't think the developers are really that stupid (I'll expand on this below). AFAIK it was as a kind of automatic list-generator for related articles. Lists are sets, not hierarchies. Lists of "related articles" are sets, not hierarchies.
===Current software=== The way that categories have been developed in software supports the idea that categories are sets. There is implicit support for categories as sets because there is nothing to stop anyone from using them that way. None of the limits of a hierarchical system exist in the category software. Such software is the best way to enforce the idea of hierarchical categories, and would be easy to implement (eg. don't allow arbitrary parenting of categories).
Until policy is decided on (and, preferably, software upgraded to support it), categories will continue to be used as sets. Since sets include hierarchies, while hierarchies don't include sets, the current categorization system is one of sets.
==Categories should be sets== ===Categories are inherently POV=== A categorization system is a worldview. Therefore it is very hard for categories to be [[WP:NPOV|NPOV]]. The following quote from [http://www.shirky.com/writings/semantic_syllogism.html#worldviews_differ_fo r_good_reasons Clay Shirky] expands:
<quote>:Many networked projects, including things like business-to-business markets and Web Services, have started with the unobjectionable hypothesis that communication would be easier if everyone described things the same way. From there, it is a short but fatal leap to conclude that a particular brand of unifying description will therefore be broadly and swiftly adopted (the "this will work because it would be good if it did" fallacy.)
:Any attempt at a global ontology is doomed to fail, because meta-data describes a worldview. The designers of the Soviet library's cataloging system were making an assertion about the world when they made the first category of books "Works of the classical authors of Marxism-Leninism." Melvyl Dewey was making an assertion about the world when he lumped all books about non-Christian religions into a single category, listed last among books about religion. It is not possible to neatly map these two systems onto one another, or onto other classification schemes -- they describe different kinds of worlds.
:Because meta-data describes a worldview, incompatibility is an inevitable by-product of vigorous argument. It would be relatively easy, for example, to encode a description of genes in XML, but it would be impossible to get a universal standard for such a description, because biologists are still arguing about what a gene actually is. There are several competing standards for describing genetic information, and the semantic divergence is an artifact of a real conversation among biologists. You can't get a standard til you have an agreement, and you can't force an agreement to exist where none actually does.
:Furthermore, when we see attempts to enforce semantics on human situations, it ends up debasing the semantics, rather then making the connection more informative. Social networking services like Friendster and LinkedIn assume that people will treat links to one another as external signals of deep association, so that the social mesh as represented by the software will be an accurate model of the real world. In fact, the concept of friend, or even the type and depth of connection required to say you know someone, is quite slippery, and as a result, links between people on Friendster have been drained of much of their intended meaning. Trying to express implicit and fuzzy relationships in ways that are explicit and sharp doesn't clarify the meaning, it destroys it.</quote>
The whole concept of an all-encompassing hierarchical category system is against the spirit of Wikipedia. It is an all-encompassing worldview, or attribution of value, to the marked-up (categorized) articles.
The "categories are hierarchies" idea presumes that it is even possible for a large group of people to agree on an all-encompassing belief-system, a ridiculous notion totally bereft of realism, a notion that has been shown wrong experientially in many IT metadata projects.
Categories, especially hierarchical categories, are about the followers of one particular worldview implicitly saying "our way is right, everyone should follow it". Note that the proportion of people who follow one particular worldview in every aspect is very small.
===Sets are much less POV=== Categorization by set is obviously less POV. An article can belong to as many sets as the community thinks it should belong to, whether directly or via multiple parenthood of the article's category (or ancestors).
==Conclusion== The benefits of hierarchical categorization #decreased redundancy #easier navigation (for a minority who have the "right" worldview) are outweighed by its costs #the community will never be in agreement over the system #harder navigation (for the majority who don't find articles where they expect them to be) #decreased accuracy (the real world is not in a big hierarchy, it merely has sets of metadata applied to it by different people)
-- Chris Wood
On 09/01/04 06:11, Chris Wood wrote:
==Conclusion== The benefits of hierarchical categorization #decreased redundancy #easier navigation (for a minority who have the "right" worldview) are outweighed by its costs #the community will never be in agreement over the system #harder navigation (for the majority who don't find articles where they expect them to be) #decreased accuracy (the real world is not in a big hierarchy, it merely has sets of metadata applied to it by different people)
So what would you suggest? We could stop categories from being able to show up in other categories - that would neatly abolish the hierarchy. OTOH, many articles would end up with a ridiculous number of cats at the top (currently kept in check by the rough rule that if it's in a subcat, you don't put it in the cat as well).
- d.
So what would you suggest? We could stop categories from being able to show up in other categories - that would neatly abolish the hierarchy. OTOH, many articles would end up with a ridiculous number of cats at the top (currently kept in check by the rough rule that if it's in a subcat, you don't put it in the cat as well).
No, what I meant is allow any arbitrary categorization scheme - sets can, after all, be inside other sets (or more than one set). A "related to" scheme rather than "is a". See "Continued" below
-- Chris Wood
Chris Wood wrote:
So what would you suggest? We could stop categories from being able to show up in other categories - that would neatly abolish the hierarchy. OTOH, many articles would end up with a ridiculous number of cats at the top (currently kept in check by the rough rule that if it's in a subcat, you don't put it in the cat as well).
No, what I meant is allow any arbitrary categorization scheme - sets can, after all, be inside other sets (or more than one set). A "related to" scheme rather than "is a". See "Continued" below
So henceforth, writers are related to people?
At 11:06 PM 9/1/2004 +0100, Timwi wrote:
No, what I meant is allow any arbitrary categorization scheme - sets can, after all, be inside other sets (or more than one set). A "related to" scheme rather than "is a". See "Continued" below
So henceforth, writers are related to people?
The solution I've occasionally suggested is to set up some sort of system whereby the meaning of categorization could be encoded right into the category link and understood by the software. That would allow all the different meanings of categorization to coexist. So for example the article [[Io (moon)]] could be [[Category:is-a:moons]] and [[Category:related-to:Earth]]. Or Category:Writers could be [[Category:is-a:people]] and [[Category:related-to:writing]].
To transition over, all that needs be done is come up with some sort of syntax that allows all the current category tags to continue working in some default relationship type, and then just like how images have gradually had their markup brought up to date by editors the categories could get sorted out too.
What I'm arguing is, "is-a" is not going to work, it's too POV. What we have now is sets of related articles, which is all we need (is-a relationships *can* be represented this way, but are more flexible).
-- Chris Wood
"Chris Wood" standsongrace@hotmail.com wrote in message news:ch65tl$ose$1@sea.gmane.org...
What I'm arguing is, "is-a" is not going to work, it's too POV. What we have now is sets of related articles, which is all we need (is-a relationships *can* be represented this way, but are more flexible).
Is-a relations work just fine for noncontroversial categories. Quartz is a type of mineral. Bill Clinton is (was) a U.S. President. Arnold Schwartzennegger is a governor of California. Canada is a country in North America. Then there are some areas that are a little more controversial, due to judgements about the classification system used: should a particular author be included in the category Fantasy writers, Science fiction writers, or Horror writers, or all three? But I think most such questions could be decided relatively amicably. (I realize that there are problems even with such simple classes as well--e.g., the discussions about which counties should be included in the Europe template--the line dividing Europe and Asia is not self-evident at all points).
Where is gets much more difficult is where there are intense POV values built into the categorization (especially where the defining criteria depend on subjective determinations). Such as whether something is a work of propaganda. Or whether a person is an alcoholic. Or if an incident was an act of terrorism or of patriotism.
But I do not see how your distinction between related-to and is-a makes any difference in such situations. I think the difficulties of NPOV categorization are not that different from that of NPOV language in an article in general. The language used to describe a person/thing/event inevitably carries some subjective value judgements about the topic. The criticism you make of categorization seems to be a variaton on a critique of language in general. In your earlier post, you
I guess I simply do not understand how related-to is any better than or so very different from is-a categorization. If you are suggesting that we should have a very flat category structure with many more categories for each article, then I very strongly disagree. Hierachichical categorization is useful, despite the inherent difficulties with any categorization schema. I do not see any significant attempts to make categorization in Wikipedia is an endorsement of any single ontological worldview--there are mutliple competing/complementary hierarchies. I think your assertion (in your earlier post) that "Sets are much less POV" is simply wrong. A set is just as POV as a hierarchy. Whether there is a list/set of alcoholics or of propaganda or of acts of terrorism--inclusion in such a list/set is expressing a POV just as much as if including within a hierarchical categorization schema.
Bkonrad
I really wasn't clear at all in that first post. I mean that articles and categories should belong to *many* categories, rather than being in a strict hierarchy. So in your example, put the author in all three categories - Fantasy writers, SF writers and Horror writers.
-- Chris Wood
At 11:11 AM 9/3/2004 +1200, Chris Wood wrote:
I really wasn't clear at all in that first post. I mean that articles and categories should belong to *many* categories, rather than being in a strict hierarchy. So in your example, put the author in all three categories - Fantasy writers, SF writers and Horror writers.
In the current system, articles can belong to many categories _and_ be in hierarchies. There already is http://en.wikipedia.org/wiki/Category:Science_fiction_writers , http://en.wikipedia.org/wiki/Category:Fantasy_writers and http://en.wikipedia.org/wiki/Category:Horror_writers , and each of these categories belongs to both the "writers" hierarchy (part of the "people" hierarchy) and their respective genre hierarchies as well.
I'm not sure what calling these things "sets" and "subsets" would add that isn't already there.
I'm reacting to two things: 1) People who say that categories are strictly "is a", and something belongs in a category only if it is an example of that category, and 2) People who say an article or category cannot belong to too many categories.
-- Chris Wood
Chris Wood wrote:
I'm reacting to two things: 1) People who say that categories are strictly "is a", and something belongs in a category only if it is an example of that category, and 2) People who say an article or category cannot belong to too many categories.
Just weighing in with my own opinions here.
1) "categories are strictly 'is a'" is too narrow a view and precludes us from doing many interesting things that we could and should be doing
2) While there may be some reasons from a technical or aesthetic point of view to limit the number of categories to which any particular article belongs, I can't see any need for a hard-and-fast rule. Probably good standards will evolve over time as we use categories and learn some things about them.
--Jimbo
"Timwi" timwi@gmx.net wrote in message news:ch5h73$jhb$1@sea.gmane.org...
So henceforth, writers are related to people?
Category:People is a set of related articles (all are about people). Writers is a subset of people. The relation is subset.
-- Chris Wood
The above essay assumes that sets are taken advantage of fully by allowing multiple inheritance and possibly even inheritance loops, and encouraging articles and categories to be given many categories rather than just one or two.
-- Chris Wood