Magnus Manske wrote:
So, it seems (if I interpret Jimbo's mail on
wikitech and the
discussion here correctly) that most of us would like *some kind* of
category scheme in wikipedia. I do, too! But, we seem to differ on the
details (shocked silence!).
So far, I saw three concepts:
1. Simple categories like "Person", "Event", etc.; about a dozen
total.
2. Categories and subcategories, like
"Science/Biology/Biochemistry/Proteomics", which can be "scaled down"
to #1 as well ("Humankind/Person" or something)
3. Complex object structures with machine-readable meta-knowledge
encoded into the articles, which would allow for quite complex
queries/summaries, like "biologists born after 1860".
Pros:
1. Easy to edit (the wiki way!)
2. Still easy to edit, but making wikipedia browseable by category,
fine-tune Recent Changes, etc.
3. Strong improvement in search functions, meta-knowledge available
for data-mining.
Cons:
1. Not much of a help...
2. We'd need to agree on a category scheme, and maintenance might get
a *little* complicated.
3. Quite complex to edit (e.g., "<category type='person'
occupation='biologist' birth_month='5' birth_day='24'
birth_year='1874' birth_place='London' death_month=.....>")
For a wikipedia I'd have to write myself, I'd choose #3, but with
respect to the wiki way, #2 seems more likely to achieve consensus (if
there is such a thing;-)
I want to thank Magnus for summarizing this recent thread. It makes it
easier to see where to jump in.
A few months ago I suggested A system of boxes where a person could
provide a subject codification or categorizastion. I also suggested at
the same time using Library of Congress Classification as a starting
point which could be modified to suit our needs. The suggestion did not
fare well. Among the objections were that it would require a lot of
work to change every article to apply codification, and that people
wouild need to learn a lot of difficult to remember codes. When I went
so far as to suggest that a "XX" code be used by parents to prevent
their children from downloading certain articles, a few objected on the
grounds that this would be permitting censorship, even though the
criteria for using it this way would reside on an individual's own
computer. Bowdlerization would remain a personal option.
There are really two issues (boxes and codes) in my proposal, and they
can be considered separately and mostly independantly.
The boxes are the more important of the two, and could function with
whole words as easily as with classification codes. "Person" could
function as easily as "CT", the LOC code for biography. The boxes could
easily go beside the summary box on the edit page. In order to
facilitate cross referencing the ability would be needed to enter more
than one category. This would allow, for example, a person looking for
mathematicians to search the articles which show boxes for both
mathematics and persons. I would leave it up to the techies to
determine whether this is done as a series of boxes, each with a single
category, or as a single box with a number of appropriately delimited
entries.
Whatever is devised would involve a certain amount of anticipation or
pre-emption. Some of the categories that we originally employ may end
up totally useless as the scheme develops. In one sense, however, this
is nothing more than scaling up something that we already do when we
wikify articles. When we do this we have no idea about which of the
links will lead to an existing article, or one which will never be
created. One of the functions of our naming conventions is to optimize
the probability that what we write and what we link will converge. If I
add to a list of Oscar nominated movies, particularly if I'm adding a
movie with a one-word title, I have no certain idea whether that title
will have some other ecyclopediable subject, or whether there have been
other movies with that title. Researching every such instance as much
as I would like is totally impractical. Much of our work here depends
on making serendipitous guesses. Then sombody else finds a use for the
term in a totally unfamiliar area.
There is absolutely no doubt that a lot of work would be required to
classify all the existing articles. I do note that someone commented in
the last couple of days that some articles had not been revised or
reviewed for a long time. That's fine; the boxes could be filled as
part of this review. There's no need to do everything overnight. I
would suggest, though, if this approach were adopted, that from the
beginning every article be botted with the code "AAA" to mean
unclasified, and any new article created without classification would
automaticaly be coded with "AAA". In due course this would be useful to
contributors looking for things to classify. Nobody should need to feel
the burden that all contributors would feel obliged to classify. Our
present summary box is often left unfilled, and as much as it may annoy
some veterans, it is relatively harmless. The same could be said of a
classification box, perhaps with the caution that inexperienced
Wikipedians might be better to leave it blank until they are familiar
with the categories.
This is already getting long, and I have other obligations. I'll write
about the second issue later.
Eclecticology