David Monniaux wrote:
On the English Wikipedia (but this is coming on other ones) we have a large amount of articles about individual highschools, most of which have nothing special and are just like the next highschool.
These articles tend: * to lack perspective [...] However, when OTRS folks delete such articles as "non notable", they often face angry remarks, accusations of lack of democratic process,
I'm not interested in schools or whether they are worthy of articles, but I'm intrigued by the mathematical nature of this problem.
The people who wrote the articles lack perspective (on other schools than their own) and when the article is removed, they lack perspective of having articles removed. Aren't these necessary phenomena at the thin end of [[the long tail]]?
If we had complete visitor statistics from web logs (including Squid caches and reusers such as Answers.com), then we could point to numbers saying that this article has only been viewed so many times in the last year, and therefore it is not notable. But even if this were practically achievable (which today it is not), would that be a useful solution?
All classic reasoning about notability is focused on the fat end of the tail. Oscars are awarded to the best films, bookstores list the best selling books, the winners get the prizes. But how can we achieve fairness, balance, equal coverage at the thin end?
In any written text (see [[en:Zipf's law]]), of all the words used (the vocabulary), about half of them will occurr only once. If the same mathematical distribution is applicable to topics in an encyclopedia, about half of all articles in Wikipedia are at the very thinnest end of the tail. If we were to use visitor statistics to cut away the least notable topics, we could easily cut away half of our stock. And that's hardly what we want.
So is there any other math we could do here?