[Wikipedia-l] Categories: An implementation
Ray Saintonge
saintonge at telus.net
Mon Jun 16 22:32:24 UTC 2003
Magnus Manske wrote:
>Filters only work if articles are assigned to categories. Setting aside wether we should use categories in wikipedia itself or only in some sifter project, categories have to be implemented in the software either way. So I hacked a barebone implementation at the test site. A list of current categories can be found at
>
>http://test.wikipedia.org/wiki/Special%3ACategories
>
For a variety of reasons, I've been waiting a few days to jump in on
this one. Like Erik I have been arguing for some kind of category
system, and have been dismayed when this important initiative gets lost
in a straw man argument over censorship or copyrights. Yes, a category
system can be used as a tool for such purposes, but it's value for
Wikipedia goes well beyond such narrow confines.
>Currently, anyone can add and delete categories. I suggest that this will be restricted to sysops later, as it will prevent a "category inflation", as well as malicious deletion of a whole category.
>
This aspect has already had a lot of response, particularly from those
concerned about the openness of Wikipedia. I don't see "category
inflation" as a serious problem. Of course if the category list were to
remain in its current draft unordered state, its uselessness would
increase with its size. In applying restricted access of any kind,
system vulnerability, either to malice or to accident, becomes the
primary criterion. Even the most trustworthy among us can sometimes
fall prey to an "oops" moment. I can see a distinction between
integrated and non-integrated categories with only the former being
subject to restricted editing, and even integrated categories could have
their descriptions fully editable. Altering integrated categories could
have unforseen and often far-reaching consequences that may not easily
be apparent.
>Anyone can assign any article to any category, and remove that assignment as well. Wiki, right? :-)
>
Of course!
>Currently, categories are *not* shown on the article page, although I have written that code (keep getting some weird effect, though, so I turned it off).
>
>To be done:
>* Personal category filters
>* Search for a category combination (in the example online, "Biology" and "Germany" should list "Anton de Bary")
>
All in good time.
>Thoughts? Comments? Bullets? ;-)
>
Seeing the jumbled nature of a category list that is only 19 items long
has only added to my conviction that we should have a codified
hierarchical system.
For a Wikipedia (I have very different approaches for a Wiktionary) I
would argue for a system that '''starts''' with Library of Congress
Classification System. Before I get back a lot of comments about why
it's not a good system, let it be known that I am perfectly aware of its
shortcomings, notably its American slant on subject classification, and
the fact that its century old structure may not be appropriate to many
modern subjects. In its favour is the simple fact that it is there and
in the public domain; it is hierarchical and easily subdivided when we
require it; it is well known and accessible at its most fundamental
structure and that gives us a coherent starting point. An alternative
system would be just as good, but it should have these characteristics
if we want to avoid a reinventing-the-wheel kind of situation. We need
to rember too that whatever system we choose will be modified as we
progress.
Some people complained before that they don't like the idea of having to
have to learn a long list of different codes. That's fair enough, but
it's important to remember that most contributors will work within a
limited range of subject areas, and that in itself will limit the codes
that they need to remember. Of course coding and classification is an
optional task. If a person feels comfortable writing text, but feels
lost with codification he can leave that to somebody else even as we aim
to make codification as easy as possible.
Any article should be classifiable in several categories. Thus the
Anton de Bary article can appear in CT for biography, DD for Germany and
QH for biology plus whatever else Wikipedians consider appropriate.
Unlike printed books there is no need to limit ourselves to a single
classification to enable us to find the book on a shelf somewhere in a
library
A category would have three elements: a code, a title, and a
description. The codes would be brief and hierarchical; they would also
be sufficient as broad search elements. The titles would function in a
manner similar to the present article titles. They could appear after a
code as a dumb descriptor for that code and linked directly only to the
third element. Like most articles these descriptions would be fully
editable, and if any edit wars were to arise out of the classification
system this is where they would happen.
Magnus's idea of using drop down boxes for putting things into
categories should work well with a hierarchical scheme. This could be
expanded into a series of nested drop down boxes as required. For the
most part LC uses only 2 letters in its classifications, and even then
there are many unused classes. Only 3 letters would give us the
capacity for 17,576 codes. In the LC system the "E" category is about
the United States, and is not normally further developed into lettered
sub-classes (though it does have numerical subdivision). We could
choose to use "EL" for United States localities, and that would be one
of our second level drop down choices. A third level choice might
divide these localities by state, but since the drop down list of all
states is taller than most people's screens it might be limited to
states beginning with each letter, and we could wait until the fourth
level to sort out California, Colorado and Connecticut. Georgia, as the
only "G" state would have been sufficiently identified at the third
level. There you have it -- all the RamBot articles have been
classified. :-) There's a great deal of flexibility there.
Eclecticology
More information about the Wikipedia-l
mailing list