First, let me say that, despite all appearances, I'm not trying to criticize Magnus (surely no one should be criticized for doing work for Wikipedia ;-) ). Or Simon! I just think we need to think harder about all of these issues.
Maybe Simon can answer the question Magnus couldn't (or, not very well as far as I could tell): what is the *purpose* of this feature? What is it supposed to accomplish?
On Sun, 6 Jan 2002, Simon Kissane wrote:
I like Magnus' idea a lot. Yes, we could just use ordinary old links. But I like this better for a couple of reasons:
- Software can understand the {{CATEGORY ...}}
notation much better than it can understand plain old links. What if I want to extract all the articles on linguistics from Wikipedia? With categories like this, I could have a script that could do it easily. But with just plain old links, writing a script to do it would be difficult.
OK, this is helpful: categories could be used to sort articles into broad compilations (associated with academic fields like linguistics) that, for whatever reason (and who can predict what reasons they would be?), people might like to have.
I can see a use for that, but I think it needs to be clarified further: is the claim that it would be useful to have articles sorted only at the top-level categories (those on the HomePage, just for example) or at some expanded level? If at some expanded level, depending on *how* expanded, the purpose(s) of the feature might change considerably.
Moreover, it is *not* at all clear to me (particularly given, as Simon says, we might want to support multiple category schemes--though this too I might doubt, as I'll have to explain) that this particular implementation is the best. What *would* be the best way to sort articles into broad categories? Why not, for example, have special pages that simply list article titles, so that, e.g., [[linguistics category]] would contain nothing but links to articles that someone asserts belong to the category of linguistics? We could just as easily autogenerate lists of uncategorized pages, and for each article page we could have the script look at the category: pages to see whether the article is in a category. I'm not saying that this is a better idea; I'm just saying that there are multiple ways of going about doing something, and it would be a mistake not to consider them.
- We can easily find "loose" pages, that don't belong
to a category. (On the other hand, we could mark the category pages as category pages, and then search for pages aren't linked from a page marked as a category page.)
Larry suggested having a drop-down box, to limit the categories the users could choose from. I think Wikipedia should support multiple category schemes, and should allow anyone to add their own categories.
Well, we could have multiple drop-down boxes, eh? How else would we individuate our multiple category schemes? The whole point of a drop-down box is to disallow a category scheme from metastasizing for frivolous reasons (such as that somebody didn't know that an article that goes under XYZ really belongs under XYZ, and so creates its own category--just an example).
That way we can experiment, and see what works best.
How exactly would we experiment? What would we be seeing "works best"?
I think at this point we need to be very clear on what we mean by "category scheme." On the one hand, there are schemes of the sort we have on the Home Page, or the Library of Congress catalog scheme. Those schemes (1) list subjects, (2) arrange those subjects under larger headings ("supercategories"), and even (3) provide an ordering of some sort within the "supercategories." So when you say we can experiment and see what works best, which of these (or what combination) do you mean? We already do experiment with multiple category schemes in the sense of a combination of (1) and (2). But this doesn't list all articles, of course.
But when Magnus proposes to allow us to list the category, or categories, of a particular article on an article's page itself (even within the body of the article itself), he provides us no particular category scheme in *any* of the senses in (1), (2), or (3) (which is fine). What Simon asserts now is that Magnus' feature allows is "multiple category schemes." I don't see this, though. What it allows us, rather, is multiple *categories*, at the whim of the user: so we could expand the list mentioned in (1) at will. That's the only sense in which I can see how Magnus's feature allows us "multiple category schemes."
By contrast, there's nothing about Magnus' feature that allows us to produce multiple category schemes in the sense in which we've *usually* talked about them on Wikipedia (and Nupedia), viz., the *combination* of (1) (a specific, delimited set of subject categories) and (2) (an arrangement of that delimited set into supercategories). As far as I can tell, Magnus' feature at present would give us *a single* infinitely expandable set of categories.
People with alternative views of how to categorise things can create their own category schemes (and categorising things is one area where there are often as many views as there are people, probably because there is no one right answer.)
Wouldn't that make categorization particularly pointless? No one person is going to categorize all our articles, I imagine--no one person is competent to do so, probably. That means we have to work together on this. Now, I can see multiple competing category schemes (maybe--but I'd like to know what the purpose of *that* would be). Say, two or three. More than that, and, again, we've got a veritable babel; in that case, I doubt any one scheme would succeed in categorizing all the articles. Even two or three is a little confusing: won't "philosophy" be a category in any plausible scheme? Similar with other traditional subjects. So how will the competing category lists (not schemes, really) be distinguished?
On the other hand, if we can agree in advance on one set of categories, then, *for the purpose of sorting articles into broad academic fields* (which, as I said, seems like a clear, reasonably useful purpose), we can *work together* on sorting all the articles. That would be a good thing: it could be a useful, accurate piece of metadata.
I think it would be nice if we could have different "category namespaces", to support multiple category schemes. There should also be a way to lock cateogy namespaces: so I can have my own category namespace, and only I am allowed to assign pages to categories within it; or so (like Larry seems to be suggesting) people can't create their own categories, but they can assign pages to pre-existing ones.
I'm not sure I understand, exactly, but is the idea here somewhat like the one I suggested above? Viz., we put the metainformation about categories not on article pages but on special categorization pages?
I also think that information on "what pages this category belongs to" belongs in the category, not in the page (although I don't know how you were actually installing this.)
Aha, yes. Might be better, yes.
Finally, even if we don't want this sort of feature for Wikipedia, why not keep the code, but make it an administrator configurable option? So if Larry & Jimbo don't want to use it on the Wikipedia.com server, they can switch it off, but if other people want to take PHP Wikipedia's code to use for some other purpose they can turn it on if they want to?
I'd be 100% in favor of something like that.
I really don't like to sound contrary (really, I don't!), but I think that whenever we propose new features that could potentially complicate the process of building Wikipedia, and that could be abused or misused (resulting in confusion if nothing else), we should think more carefully about what we are doing and why we're doing it, exactly.
So far, along these lines, there are *two* actual reasons that I have spotted for why we might want to do *one specific* thing, viz., list (somewhere) the metadata about what *general categories* articles are classed under. To wit:
(1) It would help sort the Recent Changes page nicely, so that specialists can, if they want, focus just on articles in their areas. (Others could view all categories at once.) If this were all we needed to accomplish, then we might as well sort *edits* into categories, not *articles*.
(2) It would allow us to produce a list of all the articles in one broad area of study, which would no doubt be useful for a variety of purposes. For this, of course, we need to sort articles, not edits.
Now, there are a variety of ways we could accomplish both of these purposes. The best I think we've heard so far would work like this:
On each article page, there is a multiple-selection box that allows us to place articles into one or more categories from among a set of categories that is previously decided upon by Wikipedia members and probably Nupedia as well (it would be nice if the categories corresponded to Nupedia review groups). This particular datum is editable like anything else in the article. There is *also* a set of pages sorting existing articles into these categories based on the metadata found on the articles; these might or might not be editable. There is, as well, a page or several of unsorted articles; from that page one could visit different pages and sort them quickly.
The latter proposal would accomplish purposes (1) and (2) as follows: the metadata would allow us to sort the Recent Changes page so we can view only those categories of articles we're interested in; it would also allow us to generate (and further organize, perhaps) broad categories of articles. One can see an autogenerated Wikipedia Encyclopedia of Mathematics, for example!
Larry