--- Larry Sanger lsanger@nupedia.com wrote:
First, let me say that, despite all appearances, I'm not trying to criticize Magnus (surely no one should be criticized for doing work for Wikipedia ;-) ). Or Simon! I just think we need to think harder about all of these issues.
Maybe Simon can answer the question Magnus couldn't (or, not very well as far as I could tell): what is the *purpose* of this feature? What is it supposed to accomplish?
Well, as I see it, the purpose is to make the administration of lists of pages easier -- these lists could be categories (such as Philosophy, or Mathematics, or so on) -- but I also think other lists such as Biographies, Mathematics, U.S. Presidents, International Intergovernmental Organizations, and so on.
Consider for example if I was to put the article on Bill Clinton in the Biography listing, and the U.S. Presidents listing. At the moment I have to edit two or three pages -- the Biography and the U.S. Presidents listing to add a link to the article, and the article to add a backlink to the lists (if so desired). With Magnus proposal, as I understand it, I'd just have to insert "{{{CATEGORY Biography, US_Presidents}}}" into the article and I'd be done.
Also, with these categories, its easier to automatically extract the articles in Wikipedia on that category, since software can parse "{{{CATEGORY ...}}}" more easily than links on a category page (since it can't tell which links point to pages within the category, and which links point elsewhere -- unless it has AI, of course).
And supposing we want to divide articles up by traditional academic discipline, using this lets us easily see which articles have not yet been assigned. (We'd simply have to do a query to see which articles are not in one of the categories which make up our category scheme.)
[snip]
OK, this is helpful: categories could be used to sort articles into broad compilations (associated with academic fields like linguistics) that, for whatever reason (and who can predict what reasons they would be?), people might like to have.
I can see a use for that, but I think it needs to be clarified further: is the claim that it would be useful to have articles sorted only at the top-level categories (those on the HomePage, just for example) or at some expanded level? If at some expanded level, depending on *how* expanded, the purpose(s) of the feature might change considerably.
Well, I see two uses for this. Firstly, putting articles into broad categorisations, such as those used on the home page. Second, is maintaining lists like U.S. President, and so on.
I'm not proposing we try to design a complex hierarchial classification scheme. For starters, we can put all the Maths pages into a "Mathematics" category. But, if someone wants to go to the trouble of creating additional categories, "Mathematics--Analysis" and "Mathematics--Topology" and "Mathematics--Geometry" and so on, why not let them? Lets create a basic category scheme to start with, and let it grow finer over time as (and if) people see the need.
We should also allow people to create low-level categories without fitting them into a hierarchy -- I should be able to go ahead and create the category U.S. Presidents, without having to decide whether it belongs under History or Politics or what, or where it should belong in some finer subclassification of those topics. Later on, if we feel the need, and once the category system is sufficently evolved, we can worry about how to fit these standalone low-level categories into broader ones.
Moreover, it is *not* at all clear to me (particularly given, as Simon says, we might want to
support multiple category schemes--though this too I might doubt, as I'll have to explain) that this particular implementation is the best. What *would*
be the best way to sort articles into broad categories? Why not, for example, have special pages that simply list article titles, so that,
e.g.,
[[linguistics category]] would contain nothing but links to articles that someone asserts belong to the category of linguistics? We could just as easily autogenerate lists of uncategorized pages, and for each article page we could have the script look at the category: pages to see whether the article is in
a category. I'm not saying that this is a better idea; I'm just saying that there are multiple ways of going about doing something, and it would be a mistake not to consider them.
I think the approach you suggest would be inferior to Magnus' for a number of reasons. Firstly, when looking at an article, you would not be able to see which categories it belonged to, unless someone added a backlink to that category. Secondly, if I want to add an article to three different categories, I have three different pages I have to edit -- four if I want to include a backlink in the article. Thirdly, by using the "{{{CATEGORY ...}}}" notation, we can store the categories directly in the database, if we want, as a CATEGORY table -- which will mean faster access.
[snip]
Larry suggested having a drop-down box, to limit the categories the users could choose from. I
think
Wikipedia should support multiple category
schemes,
and should allow anyone to add their own categories.
Well, we could have multiple drop-down boxes, eh? How else would we individuate our multiple category schemes? The whole point of a drop-down box is to disallow a category scheme from metastasizing for frivolous reasons (such as that somebody didn't know
that an article that goes under XYZ really belongs under XYZ, and so creates its own category--just an example).
I agree that we need to find a way to stop frivolous or accidental creation of categories. But at the same time, I think we should create tools that can be used for a wide variety of purposes, rather than putting things in a straightjacket.
The main point behind having multiple category schemes is that I can ask the software the question "Does this page belong to any category in this scheme?" Suppose we decide we want to place every Wikipedia article in certain broad categories (Mathematics, Physics, Chemistry, Biology, Philosophy, History, etc.), then we want a list of all articles which do not currently belong to one of those categories. Suppose "Bill Clinton" belongs to the category "U.S. Presidents" -- that article should still come up in our list, since although it belongs to a category, it does not belong to the broad category scheme. So at the least, we'd want a way of distinguishing subject categories (such as History, Philosophy, etc.) from other categories (like U.S. Presidents, Treaties, Roman Emperors, Popes, etc.).
Secondly, suppose someone wants to create a subclassification of a pre-existing category. Say create categories "History--Ancient", "History--Medieveal" and "History--Modern", or "History--Europe", "History--North America", "History--Asia" and so on. Then they'd want to ask "show me all the articles in category History which haven't been assigned to categories History--Ancient, History--Modern or History--Europe". This would involve some way of marking categories as subcategories of a larger category. However, these aren't just simple subcategories within the same category scheme -- these are two orthogonal categorisations. We want to be able to generate separate "not yet assigned" lists for each.
To stop people accidentally or frivolousy creating categories, we could make it so that people have to execute some special procedure (e.g. a create_category action on the script) to create a category (so they don't accidentally automatically create one, by mispelling it say.) That procedure should show them what categories already exist, warn them against creating them needlessly, explain Wikipedia policy on categories, and so on.
We should also enable for categories to be deleted if they are created in error. Only admins should do that after consultation -- but we could permit anyone to delete them, if we had an "undelete" feature (i.e. the category disappears from the list of categories, but its data is retained, so it can be undeleted.)
As to drop down boxes, they have their advantages and disadvantages. The advantage is that they are easier to use than {{{CATEGORY ...}}}. The disadvantage is that if we ended up with a lot of categories, they'd become unwieldly. They should be multiple selection combo-boxes, not drop-down boxes, if we are going to allow the one article to belong to multiple categories. Finally, the problem with combo boxes is that its easy to accidentally add or remove an article from a category -- one misplaced click is all it takes. At least with "{{{CATEGORY ...}}}", they have to type or delete something.
But whatever user interface we choose, we can still provide the same backend implementation.
That way we can experiment, and see what works best.
How exactly would we experiment? What would we be seeing "works best"?
Well, as I said, create little categories for things like "U.S. Presidents", "Kings of France", "Bible"... and if someone wants to subcategorize a broader category (i.e. create "History--Asia" and "History--United States"), let them. Let the system evolve (just like how we let Wikipedia articles evolve). Remove categories that are unneccesary or stupid. Every now and then, look back over what is there, and try to move things into a more coherent system.
I think at this point we need to be very clear on what we mean by "category scheme." On the one hand,
there are schemes of the sort we have on the Home Page, or the Library of Congress catalog scheme. Those schemes (1) list subjects, (2) arrange those subjects under large headings ("supercategories"), and even (3) provide an ordering of some sort within the "supercategories." So when you say we can experiment and see what works best, which of these (or what combination) do you mean?
Well, now I thought about it more, what I'm really talking about is groups of categories which fit together -- e.g. all different subdivisions of the one broader subject along one aspect. (Kind of like a faceted library classification, ala the Colon Classification of S. R. Ranganathan.) So we'd really only have one category scheme, it would just be, in part, hierarchial and faceted.
We already do experiment with multiple category schemes in the sense of a combination of (1) and (2). But this doesn't list all articles, of course.
But when Magnus proposes to allow us to list the category, or categories, of a particular article on an article's page itself (even within the body of the article itself), he provides us no particular category scheme in *any* of the senses in (1), (2), or (3) (which is fine). What Simon asserts now is that Magnus' feature allows is "multiple category schemes."
Well, it doesn't at the moment allow that, but it could be extended to do so, which is I suppose what I am proposing.
[snip]
People with alternative views of how to categorise things can create their own category schemes (and categorising things is one area where there are often as many views as there are people, probably because there is no one right answer.)
Wouldn't that make categorization particularly pointless? No one person is going to categorize all
our articles, I imagine--no one person is competent to do so, probably. That means we have to work together on this. Now, I can see multiple competing category schemes (maybe--but I'd like to know what the purpose of *that* would be). Say, two or three. More than that, and, again, we've got a veritable babel; in that case, I doubt any one scheme would succeed in categorizing all the articles. Even two or three is a little confusing: won't "philosophy" be a category in any plausible scheme?
Similar with other traditional subjects. So how will the competing category lists (not schemes, really) be distinguished?
Now I think about it, I agree with you, so I withdraw that aspect of what I'm proposing.
On the other hand, if we can agree in advance on one set of categories, then, *for the purpose of sorting
articles into broad academic fields* (which, as I said, seems like a clear, reasonably useful purpose), we can *work together* on sorting all the articles. That would be a good thing: it could be a useful, accurate piece of metadata.
I think that would be useful also. But I also think we should permit the creation of many finer categories, such as U.S. Presidents, or Kings of England, or Treaties, or Thailand, or 12th century... and also subdivisions of subjects, so we can have "Philosophy--Philosophy of Religion" and "Mathematics--Analysis" and "Law--criminal law"... I'm not suggesting we design a whole detailed category scheme from the top-down, but rather let one grow from the bottom up...
I think it would be nice if we could have different "category namespaces", to support multiple category schemes. There should also be a way to lock category namespaces: so I can have my own category namespace, and only I am allowed to assign pages to categories within it; or so (like Larry seems to be suggesting) people can't create their own categories, but they can assign pages to
pre-existing ones.
I'm not sure I understand, exactly, but is the idea here somewhat like the one I suggested above? Viz.,
we put the metainformation about categories not on article pages but on special categorization pages?
It was a badly thought out suggestion, so I withdraw it.
[snip]
I really don't like to sound contrary (really, I don't!), but I think that whenever we propose new features that could potentially complicate the process of building Wikipedia, and that could be abused or misused (resulting in confusion if nothing
else), we should think more carefully about what we are doing and why we're doing it, exactly.
My attitude is different -- build versatile tools, that can be used for many things, and then see what useful things people can do with them. I agree though we should be careful to avoid abuse or misuse, or having some wizzy new software feature get in the way of the primary purpose of Wikipedia, which is writing articles (categorising them is only secondary).
[snip]
(1) It would help sort the Recent Changes page nicely, so that specialists can, if they want, focus
just on articles in their areas. (Others could view all categories at once.) If this were all we needed to accomplish, then we might as well sort *edits* into categories, not *articles*.
(2) It would allow us to produce a list of all the articles in one broad area of study, which would no doubt be useful for a variety of purposes. For this,
of course, we need to sort articles, not edits.
Let me add: (3) We need a heap of categories to make it easier to maintain and manipulate lists of Presidents, Philosophers, Mathematicians, Countries, International Organizations, and so on. (One category per a list.)
(4) We need categories to group articles on some broad topic, such as all the articles on the Bible, or articles on Hinduism, or all articles on the U.S. Government, or so on. When dealing with a broad topic like these, it would be nice to see a list of all the articles on the topic, to help improve the coherency between the different articles on the topic. However, although these are broad topics, they are a lot narrower than the disciplines you suggest.
(5) We need categories to help progressively develop a more structured category scheme for Wikipedia. (The bigger we get, the more essential organizing is going to be, or else everything will just turn into a mess...)
Now, there are a variety of ways we could accomplish both of these purposes. The best I think we've heard so far would work like this:
On each article page, there is a multiple-selection box that allows us to place articles into one or more categories from among a set of categories that is previously decided upon by Wikipedia members and probably Nupedia as well (it would be nice if the categories corresponded to Nupedia review groups). This particular datum is editable like anything else in the article. There is *also* a set
of pages sorting existing articles into these categories based on the metadata found on the articles; these might or might not be editable. There is, as well, a page or several of unsorted articles; from that page one could visit different pages and sort them quickly.
The latter proposal would accomplish purposes (1) and (2) as follows: the metadata would allow us to sort the Recent Changes page so we can view only those categories of articles we're interested in; it would also allow us to generate (and further organize, perhaps) broad categories of articles. One can see an autogenerated Wikipedia Encyclopedia of Mathematics, for example!
We could have both your proposal and mine. We have a fixed set of broad categories for your purposes (1) and (2). And we have an expandable list of independent categories or subcategories for my purposes (3)-(5).
Larry
Simon.
__________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/