[Wikipedia-l] Categories: An implementation

Ray Saintonge saintonge at telus.net
Mon Jun 16 22:32:24 UTC 2003


Magnus Manske wrote:

>Filters only work if articles are assigned to categories. Setting aside wether we should use categories in wikipedia itself or only in some sifter project, categories have to be implemented in the software either way. So I hacked a barebone implementation at the test site. A list of current categories can be found at
>
>http://test.wikipedia.org/wiki/Special%3ACategories
>
For a variety of reasons, I've been waiting a few days to jump in on 
this one.  Like Erik I have been arguing for some kind of category 
system, and have been dismayed when this important initiative gets lost 
in a straw man argument over censorship or copyrights.  Yes, a category 
system can be used as a tool for such purposes, but it's value for 
Wikipedia goes well beyond such narrow confines.

>Currently, anyone can add and delete categories. I suggest that this will be restricted to sysops later, as it will prevent a "category inflation", as well as malicious deletion of a whole category.
>
This aspect has already had a lot of response, particularly from those 
concerned about the openness of Wikipedia.  I don't see "category 
inflation" as a serious problem.  Of course if the category list were to 
remain in its current draft unordered state, its uselessness would 
increase with its size.  In applying restricted  access of any kind, 
system vulnerability, either to malice or to accident, becomes the 
primary criterion.  Even the most trustworthy among us can sometimes 
fall prey to an "oops" moment.  I can see a distinction between 
integrated and non-integrated categories with only the former being 
subject to restricted editing, and even integrated categories could have 
their descriptions fully editable.  Altering integrated categories could 
have unforseen and often far-reaching consequences that may not easily 
be apparent.  

>Anyone can assign any article to any category, and remove that assignment as well. Wiki, right? :-)
>
Of course!

>Currently, categories are *not* shown on the article page, although I have written that code (keep getting some weird effect, though, so I turned it off).
>
>To be done:
>* Personal category filters
>* Search for a category combination (in the example online, "Biology" and "Germany" should list "Anton de Bary")
>
All in good time.

>Thoughts? Comments? Bullets? ;-)
>
Seeing the jumbled nature of a category list that is only 19 items long 
has only added to my conviction that we should have a codified 
hierarchical system.

For a Wikipedia (I have very different approaches for a Wiktionary) I 
would argue for a system that '''starts''' with Library of Congress 
Classification System.  Before I get back a lot of comments about why 
it's not a good system, let it be known that I am perfectly aware of its 
shortcomings, notably its American slant on subject classification, and 
the fact that its century old structure may not be appropriate to many 
modern subjects.  In its favour is the simple fact that it is there and 
in the public domain; it is hierarchical and easily subdivided when we 
require it; it is well known and accessible at its most fundamental 
structure and that gives us a coherent starting point.  An alternative 
system would be just as good, but it should have these characteristics 
if we want to avoid a reinventing-the-wheel kind of situation.  We need 
to rember too that whatever system we choose will be modified as we 
progress.

Some people complained before that they don't like the idea of having to 
have to learn a long list of different codes.  That's fair enough, but 
it's important to remember that most contributors will work within a 
limited range of subject areas, and that in itself will limit the codes 
that they need to remember.  Of course coding and classification is an 
optional task.  If a person feels comfortable writing text, but feels 
lost with codification he can leave that to somebody else even as we aim 
to make codification as easy as possible.

Any article should be classifiable in several categories.  Thus the 
Anton de Bary article can appear in CT for biography, DD for Germany and 
QH for biology plus whatever else Wikipedians consider appropriate. 
 Unlike printed books there is no need to limit ourselves to a single 
classification to enable us to find the book on a shelf somewhere in a 
library

A category would have three elements: a code, a title, and a 
description.  The codes would be brief and hierarchical; they would also 
be sufficient as broad search elements.  The titles would function in a 
manner similar to the present article titles.  They could appear after a 
code as a dumb descriptor for that code and linked directly only to the 
third element.  Like most articles these descriptions would be fully 
editable, and if any edit wars were to arise out of the classification 
system this is where they would happen.

Magnus's idea of using drop down boxes for putting things into 
categories should work well with a hierarchical scheme.  This could be 
expanded into a series of nested drop down boxes as required.  For the 
most part LC uses only 2 letters in its classifications, and even then 
there are many unused classes.  Only 3 letters would give us the 
capacity for 17,576 codes.  In the LC system the "E" category is about 
the United States, and is not normally further developed into lettered 
sub-classes (though it does have numerical subdivision).  We could 
choose to use "EL" for United States localities, and that would be one 
of our second level drop down choices.  A third level choice might 
divide these localities by state, but since the drop down list of all 
states is taller than most people's screens it might be limited to 
states beginning with each letter, and we could wait until the fourth 
level to sort out California, Colorado and Connecticut.  Georgia, as the 
only "G" state would have been sufficiently identified at the third 
level.  There you have it -- all the RamBot articles have been 
classified. :-)   There's a great deal of flexibility there.

Eclecticology






More information about the Wikipedia-l mailing list