I am doing a PhD on online civic participation project
(e-participation). Within my research, I have carried out a user
survey, where I asked how many people ever edited/created a page on a
Wiki. Now I would like to compare the results with the overall rate of
wiki editing/creation on country level.
I've found some country-level statistics on Wikipedia Statistics (e.g.
3,000 editors of Wikipedia articles in Italy) but data for UK and
France are not available since Wikipedia provides statistics by
languages, not by countries. I'm thus looking for statistics on UK and
France (but am also interested in alternative ways of measuring wiki
editing/creation in Sweden and Italy).
I would be grateful for any tips!
Sunny regards, Alina
European University Institute
For those of you who are interested in reverts:
I just presented our paper on accurate revert detection at the ACM Hypertext and Social Media conference 2012, showing a significant accuracy (and coverage) gain compared to the widely used method of finding identical revisions (via MD5 hash values) to detect reverts, proving that our method detects edit pairs that are significantly more likely to be actual reverts according to editors perception of a revert and the Wikipedia definition. 35% of the reverts found by the MD5 method in our sample are not assessed to be reverts by more than 80% of our survey participants (accuracy 0%). The provided new method finds different reverts for these 35% plus 12% more, which show a 70% accuracy.
Find the PDF slides, paper and results here:
I'll be happy to answer any questions.
More in detail:
The MD5 hash method employed by many researchers to identify reverts (as some others, like using edit comments) is acknowledged to produce some inaccuracies as far as the Wikipedia definition of a revert ("reverses the actions of any editors", "undoing the actions"..) is concerned. The extent of these inaccuracies is usually judged to be not too large, as naturally, most reverting edits are carried out immediately after the edit to be reverted, being an "identity revert" (Wikipedia definition: "..normally results in the page being restored to a version that existed previously"). Still, there has not been a user evaluation assessing how well the detected reverts conform with the Wikipedia definition and what users actually perceive as a revert. We developed and evaluated an alternative method to the MD5 identity revert and show a significant increase in accuracy (and coverage).
34% of the reverts detected by the MD5 hash method in our sample actually fail to be acknowledged as full reverts by more than 80% of users in our study, while our new method performs much better, finding different reverts for these 34% wrongly detected reverts plus 12% more reverts, showing an accuracy of 70% for these newly found edit pairs actually being reverts according to the users. The increased accuracy performance between the reverts detected only by the MD5 and only by our new method is highly significant, while reverts detected by both methods also perform significantly better than those only detected by the MD5 method.
Although this method is much slower than the MD5 method (as it is using DIFFs between revisions) it reflects much better what users (and the Wikipedia community as a whole) see as a revert. It thereby is a valid alternative if you are interested in the antagonistic relationships between users on a more detailed and accurate level. There is quite some potential to make it even faster by combining the two methods, decreasing the number of DIFFs to be performed, let's see if we can come around doing that :)
The scripts and results listed in the paper can be found at http://people.aifb.kit.edu/ffl/reverts/
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck
Building 11.40, Room 222
Phone: +49 721 608 4 6584
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
Guillermo Garrido (NLP Group, UNED, Spain) and Enrique Alfonseca Google
Research Zurich, one of our partners in the RENDER project  extracted a
data set that contains all attribute-value pairs of info boxes out of
English Wikipedia articles since 2003.
This 5.5 GB large data set, which is called Wikipedia Historical Attributes
Data (WHAD), is freely available on the download page of the RENDER toolkit
More detailed information about the data set can be found at Enrique
Alfonseca's website .
Enrique will attend the Wikipedia Academy 2012  and is going to present
his work during the Paper Session III: Analyzing Wikipedia Article Data 
A short preview of this paper was published in the current
Best regards from Berlin,
Wikimedia Deutschland e.V. | Obentrautstraße 72 | 10963 Berlin
Tel. (030) 219 158 260
Stellen Sie sich eine Welt vor, in der jeder Mensch freien Zugang zu der
Gesamtheit des Wissens der Menschheit hat. Helfen Sie uns dabei!
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Dear Wikipedia researchers!
Our manuscript on is now released by PLoS ONE and available at:
I would delightedly take your comments and remarks.
Dr. Taha Yasseri.
Department of Theoretical Physics
Institute of Physics
Budapest University of Technology and Economics
Budafoki út 8.
H-1111 Budapest, Hungary
tel: +36 1 463 4110
fax: +36 1 463 3567
On 06/19/2012 03:41 AM, Sumana Harihareswara wrote:
> This is a reminder that you're invited to the pre-Wikimania hackathon,
> 10-11 July in Washington, DC, USA:
> In order to come, you have to register for the Wikimania conference:
> (Unfortunately, the period for requesting scholarships is now over.)
> At the hackathon, we'll have trainings and projects for novices, and we
> welcome creators of all Wikimedia technologies -- MediaWiki, gadgets,
> bots, mobile apps, you name it -- to hack on stuff together and teach
> each other.
> Hope to see you!
Actually, you don't have to register for Wikimania to come to the
hackathon. The registration fee is only required for the main
conference days; everyone is welcome to come to the hackathon days and
unconference for free. So tell your DC friends to sign up at
https://wikimania2012.wikimedia.org/wiki/Hackathon and come!
Engineering Community Manager
This weekend, TechWeek Chicago starts: http://techweek.com/
The Foundation's Peter Gehres is copresenting the analytics presentation
"How Wikipedia Doubled its Online Fundraising" this Saturday. If you're
at TechWeek, he and other Wikimedians want to meet with you and talk shop!
Saturday June 23, 2012 4:00pm - 4:45pm @ 1 - Main Stage (222 Merchandise
Mart Plaza, Chicago, IL)
"In 2010, online donations to Wikipedia more than doubled, from $7.5
million to $16 million and, in 2011, increased another 33%. Much of this
increase was driven by user research conducted in Chicago. Design
researcher Billy Belchev from Webitects will get into the nitty-gritty
of form design and testing, user interviews. Do one-step forms work
better than multi-step? Does PayPal help or hurt your numbers? What are
the effect of “Jimmy” banners? The answers are based on data from the
fifth most trafficked website in the world."
Engineering Community Manager