On 25/01/07, Lars Aronsson lars@aronsson.se wrote:
In any written text (see [[en:Zipf's law]]), of all the words used (the vocabulary), about half of them will occurr only once. If the same mathematical distribution is applicable to topics in an encyclopedia, about half of all articles in Wikipedia are at the very thinnest end of the tail. If we were to use visitor statistics to cut away the least notable topics, we could easily cut away half of our stock. And that's hardly what we want.
So is there any other math we could do here?
The metric I would love to see is some way of identifying when
[amount of value gained to our readers by this article] << [amount of hassle caused to our volunteers by having this article]
where "hassle" is deletions, cleanup, vandalism repair, mentoring editwars, and the like, whilst "value" is... well, value. People gaining useful information from it.
(Teenagers playing with the article to call their headmaster a child molestor is not "value", even though it may seem the perfectly sensible use to them, nor is using the article to promote a business... "value" is pretty much a function of quality times readers)
Unfortunately, it's almost entirely imopssible to calculate except by gut feeling, and entirely impractical to implement. Ah, well.