[Wikimedia-l] [Wikimedia Announcements] 2012-13 Annual Plan of the Wikimedia Foundation

Andreas Kolbe jayen466 at gmail.com
Sat Aug 4 15:36:32 UTC 2012

On Sun, Jul 29, 2012 at 10:17 PM, Tilman Bayer <tbayer at wikimedia.org> wrote:

> Of course, here the term "high quality" does not necessarily mean,
> say, featured content (e.g. on the English Wikipedia, featured
> articles currently make up less than 0.1% of the total articles), but
> instead refers to comparisons with average contributions.
> Someone from the Education Program will be able to give a more
> thorough overview of the efforts to evaluate its results, but for
> example I'm aware of
> https://blog.wikimedia.org/2012/04/19/wikipedia-education-program-stats-fall-2011/
> . The quantitative method used there has its limitations, but similar
> methods are employed in independent (i.e non-WMF) research about
> Wikipedia in the academic literature.

It certainly does have limitations. Let's look at what it says:


In the Wikipedia Education Program, professors assign their students to
edit Wikipedia articles as a grade for class, assisted by volunteer
Wikipedia Ambassadors. In fall 2011, 55 courses participated in the program
in the United States, with students editing articles on the English
Wikipedia. On average, these students added 1855 bytes of content that
stayed on Wikipedia, compared to only 491 for a randomly chosen sample of
new users who joined English Wikipedia in September 2011. These numbers
establish that students who participate in the Wikipedia Education Program
contribute significantly more quality content that stays on Wikipedia than
other new users.


Apart from John's very salient question about how the random sample of
editors was selected, another very obvious issue is the traffic the edited
pages attract. A random sample of users might include contributors to very
popular and heavily edited pages, while students' edits are more likely to
be to specialised pages on scholarly niche topics that get very few views,
and attract few edits.

Content on little watched pages always stays longer than content on highly
watched pages with a high edit turnover. This is quite irrespective of edit
quality. Just look at some Wikipedia pages on Indian villages ... their
content is crap, with outstanding long-term stability. :)

So until the analysis also factors in page viewing statistics and average
edits per month on each page, the variables are hopelessly confounded, and
the conclusions are nothing but wishful thinking (not to say lying with

In other words, it's impossible to conclude that content staying on
Wikipedia is a reflection of edit quality, rather than a reflection of said
content being on a very obscure page that no one reads or edits.

If the Foundation has an interest in producing meaningful statistical
analyses, I would suggest actually employing a statistician who can give
such posts a look-over and point out the obvious fallacies.


More information about the Wikimedia-l mailing list