On the English Wikipedia (but this is coming on other ones) we have a large amount of articles about individual highschools, most of which have nothing special and are just like the next highschool.
These articles tend: * to lack perspective ** give name of personnel who are private persons, which is unencyclopedic (ex: there's a teacher called foobar) ** devote inordinate length to individual, non notable incidents (exemple: some incident because of drunk students at a party 2 years ago)
* to be a magnet for vandalism, from disgruntled or bored students ** this vandalism can give details about the personal life of some minors ** it often also is demeaning ** and sometimes contains outright libel (accusing teachers or principals of being pedophiles etc.)
* not to be patrolled much ** they interest few people
* to lack sources ** unique source tends to be the school's own cite; in theory we should be able to have multiple sources, including independent ones
In short, they have little encyclopedic interest, are a target for underage vandals, create lots of work for the OTRS folks and the Foundation.
However, when OTRS folks delete such articles as "non notable", they often face angry remarks, accusations of lack of democratic process, and what else; often from people who apparently feel strongly enough to keep the article, but not strongly enough to patrol it for abuse.
Other users, including admins, seem to entirely ignore [[Wikipedia:Schools]] as applicable policy.
In fact, I'll also suggest altering the policy in a way: the simple fact that two "celebrities" from a school have an article on WP should not be cause to create an article about this school.
Tons of non notable schools have had a celebrity go through. That does not make them notable.
What would be relevant is: if many celebrities have gone through it. For instance, Eton in England is notable because many upper class British men, in high positions, have passed through it.
In any case, I think the Foundation should issue a clear statement that admins, especially from OTRS, can CSD:A7 school articles that do not demonstrate notability. Otherwise it's not manageable.
2007/1/25, David Monniaux David.Monniaux@free.fr:
In any case, I think the Foundation should issue a clear statement that admins, especially from OTRS, can CSD:A7 school articles that do not demonstrate notability. Otherwise it's not manageable.
I don't think that's something for the Foundation to make a statement/decision about. Either get some agreement within the English Wikipedia itself, or accept that Wikipedia is something else than you and I would like, and let the dirt overtake it if that's what they want.
David Monniaux wrote:
On the English Wikipedia (but this is coming on other ones) we have a large amount of articles about individual highschools, most of which have nothing special and are just like the next highschool.
These articles tend: * to lack perspective [...] However, when OTRS folks delete such articles as "non notable", they often face angry remarks, accusations of lack of democratic process,
I'm not interested in schools or whether they are worthy of articles, but I'm intrigued by the mathematical nature of this problem.
The people who wrote the articles lack perspective (on other schools than their own) and when the article is removed, they lack perspective of having articles removed. Aren't these necessary phenomena at the thin end of [[the long tail]]?
If we had complete visitor statistics from web logs (including Squid caches and reusers such as Answers.com), then we could point to numbers saying that this article has only been viewed so many times in the last year, and therefore it is not notable. But even if this were practically achievable (which today it is not), would that be a useful solution?
All classic reasoning about notability is focused on the fat end of the tail. Oscars are awarded to the best films, bookstores list the best selling books, the winners get the prizes. But how can we achieve fairness, balance, equal coverage at the thin end?
In any written text (see [[en:Zipf's law]]), of all the words used (the vocabulary), about half of them will occurr only once. If the same mathematical distribution is applicable to topics in an encyclopedia, about half of all articles in Wikipedia are at the very thinnest end of the tail. If we were to use visitor statistics to cut away the least notable topics, we could easily cut away half of our stock. And that's hardly what we want.
So is there any other math we could do here?
On 25/01/07, Lars Aronsson lars@aronsson.se wrote:
In any written text (see [[en:Zipf's law]]), of all the words used (the vocabulary), about half of them will occurr only once. If the same mathematical distribution is applicable to topics in an encyclopedia, about half of all articles in Wikipedia are at the very thinnest end of the tail. If we were to use visitor statistics to cut away the least notable topics, we could easily cut away half of our stock. And that's hardly what we want.
So is there any other math we could do here?
The metric I would love to see is some way of identifying when
[amount of value gained to our readers by this article] << [amount of hassle caused to our volunteers by having this article]
where "hassle" is deletions, cleanup, vandalism repair, mentoring editwars, and the like, whilst "value" is... well, value. People gaining useful information from it.
(Teenagers playing with the article to call their headmaster a child molestor is not "value", even though it may seem the perfectly sensible use to them, nor is using the article to promote a business... "value" is pretty much a function of quality times readers)
Unfortunately, it's almost entirely imopssible to calculate except by gut feeling, and entirely impractical to implement. Ah, well.
On 25/01/07, Andrew Gray shimgray@gmail.com wrote:
The metric I would love to see is some way of identifying when
[amount of value gained to our readers by this article] << [amount of hassle caused to our volunteers by having this article]
Come to think of it, it would be an even more effective tool to use to test proposed policy with :-)
Lars Aronsson a écrit :
In any written text (see [[en:Zipf's law]]), of all the words used (the vocabulary), about half of them will occurr only once. If the same mathematical distribution is applicable to topics in an encyclopedia, about half of all articles in Wikipedia are at the very thinnest end of the tail. If we were to use visitor statistics to cut away the least notable topics, we could easily cut away half of our stock. And that's hardly what we want.
So is there any other math we could do here?
Perhaps a notion of service:
A Wikipedia article is interesting if it offers a service supplemental to what is available, say, from the subject's official site. If the article is just a copy of the information in the official site, with unprovable anecdotes thrown in, then it does not offer a service.
Also, with respect to schools, the thing is that Wikipedia is not a directory. It does not aim to index every company, individual etc. in the world. So we have to resort to measurements of what makes somebody or some institution "special".
*Some* highschools are special. Some have inordinate numbers of alumni going into high positions. Some frequently appear in the press, in novels, etc. Some have exceptional characteristics. These should have articles.
But there's no reason we should have an article on my neighbouring highschool, unless we also want articles on every company or organization...
David Monniaux wrote:
But there's no reason we should have an article on my neighbouring highschool, unless we also want articles on every company or organization...
However, this "unless" is problematic. A printed encyclopedia in 20 volumes can only contain so many articles, and has to cut off the long tail. Wikipedia is far bigger and steadily growing. Small towns with 25,000 inhabitants in Sweden would never have an article in Encyclopaedia Britannica, but now have articles in the English Wikipedia, and everybody seem to agree that they *are* sufficiently notable. So where is the limit drawn? Should the three schools in that town also have articles? Maybe the answer is: Not now, when Wikipedia only has 1.6 million articles, because these schools are not among the 1.6 million most notable objects in this world. But in five years time, when Wikipedia has 20 million articles, this might be different.
Maybe if the article is added now, and in five years time it is still one of the least used ones, ranking not 1.6M but 20M, then we know that now was not the right time to add this article? In that case, notability is not a property of the topic itself, but an issue in which order to add articles to Wikipedia. But it is difficult to assess today if a topic has rank 20M when Wikipedia only has 1.6M articles.
Can we compute a rank of how much each article is used now, and relate this to how many articles existed at the time when each article was created? Then we would know how premature the addition of each article was.
Again, my position is not that of judging what should be included now. I'm only trying to understand the math behind this.
2007/1/25, Lars Aronsson lars@aronsson.se:
David Monniaux wrote:
On the English Wikipedia (but this is coming on other ones) we have a large amount of articles about individual highschools, most of which have nothing special and are just like the next highschool.
These articles tend: * to lack perspective [...] However, when OTRS folks delete such articles as "non notable", they often face angry remarks, accusations of lack of democratic process,
[...] All classic reasoning about notability is focused on the fat end of the tail. Oscars are awarded to the best films, bookstores list the best selling books, the winners get the prizes. But how can we achieve fairness, balance, equal coverage at the thin end? [...]
Perhaps we can't and that's why these articles should be deleted. Isn't that the whole idea behind the rules about verifiability, orignal research and notability ?
GL
On 25/01/07, David Monniaux David.Monniaux@free.fr wrote:
In any case, I think the Foundation should issue a clear statement that admins, especially from OTRS, can CSD:A7 school articles that do not demonstrate notability. Otherwise it's not manageable.
The purpose of Wikipedia is not to make OTRS happy, any more than it is to make Articles For Deletion happy.
- d.
cc'd to wikien-l - it's not entirely clear why you left the wiki you're actually talking about out of the loop here.
On 1/25/07, David Monniaux David.Monniaux@free.fr wrote:
... Other users, including admins, seem to entirely ignore [[Wikipedia:Schools]] as applicable policy.
That's not policy. It was rejected.
wikimedia-l@lists.wikimedia.org