Thanks Tilman.
Especially for your effort to resolve the misunderstandings, which most of them I suppose are due to a shallow reading: "I had a bit of free time last night waiting for trains and I skimmed through the study and its findings."
We had two strategies to get rid of vandalisms, as you mentioned, considering only mutual reverts and waiting editors by their maturity, I suppose a vandal could not have a large maturity score by definition.
As for the data, this study has been carried out in 2011, and we worked on the latest available dump at the time. Someone experienced in academic research, especially at this scale well knows that it really takes time to get the analysis done, write the reports, get them reviewed, etc. Especially that we have published 7-8 other papers during the same period. I see no problem in this as long as the metadata and such information about the methods and the data under study are mentioned in the manuscript, which is clearly the case here. I have seen many Wikipedia studies without any mention of the dump they have used!
Back to your concern for the general impression that the news media give on wikipedia being a battlefield, I'd like to mention that I have emphasised the small number of controversial articles compare to the total number of articles in every single media response I had. Again as you mentioned, we had given the percentages explicitly in our previous work. But of course for obvious reasons journalists are not happy to highlight this. They like to report on controversies and wars! This is not our fault that what they report could be misleading, as long as we had tried our best to avoid it. An interview of mine with BBC Radio Scotland: at 04:00 I clearly say that there are millions and thousands of articles in WIkipedia which are not controversial, is available here: https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3 . I have done the same in all the others.
Finally, I wish that the public media coverage of our research which is clearly far from perfect, could also provide the members of the public a better understanding of how Wikipedia works and how fascinating it is!
Thanks again,
Taha
On 22 Jul 2013 05:58, "Tilman Bayer" tbayer@wikimedia.org wrote:
On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride z@mzmcbride.com wrote:
Anders Wennersten wrote:
A most interesting study looking at findings from 10 different language versions.
Jesus and Middle east are the most controversial articles seen over the world, but George Bush on en:wp and Chile on es:wp
FWIW, here is the review by Giovanni Luca Ciampaglia in last month's Wikimedia Research Newsletter:
https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-201... (also published in the Signpost, the weekly newsletter on the English Wikipedia)
Thanks for sharing this.
I had a bit of free time last night waiting for trains and I skimmed through the study and its findings. Two points stuck out at me: a seemingly fatally flawed methodology and the age of data used.
The methodology used in this study seems to be pretty inherently flawed. According to the paper, controversiality was measured by full page reverts, which are fairly trivial to identify and study in a database
dump
(using cryptographic hashes, as the study did), but I don't think full reverts give an accurate impression _at all_ of which articles are the most controversial.
Pages with many full reverts are indicative of pages that are heavily vandalized. For example, the "George W. Bush" article is/was heavily vandalized for years on the English Wikipedia. Does blanking the article or replacing its contents with the word "penis" mean that it's a very controversial article? Of course not. Measuring only full reverts (as the study seems to have done, though it's certainly possible I've overlooked something) seems to be really misleading and inaccurate.
They didn't. You may have overlooked the description of the methodology on p.5: It's based on "mutual reverts" where user A has reverted user B and user B has reverted user A, and gives higher weight to disputes between more experienced editors. This should exclude most vandalism reverts of the sort you describe. As noted in Giovanni's review, this method was proposed in an earlier paper, Sumi et al. ( https://meta.wikimedia.org/wiki/Research:Newsletter/2011/July#Edit_wars_and_... ). That paper explains at length how this metric serves to distinguish vandalism reverts from edit wars. Of course there are ample possibilities to refine it, e.g. taking into account page protection logs.
Personally, I'm more concerned that the new paper totally fails to put its subject into perspective by stating how frequent such controversial articles are overall on Wikipedia. Thus it's no wonder that the ample international media coverage that it generated mostly transports the notion (or reinforces the preconception) of Wikipedia as a huge battleground.
The 2011 Sumi et al. paper did a better job in that respect: "less than 25k articles, i.e. less than 1% of the 3m articles available in the November 2009 English WP dump, can be called controversial, and of these, less than half are truly edit wars."
In order to measure how controversial an article is, there are a number
of
metrics that could be used, though of course no metric is perfect and
many
metrics can be very difficult to accurately and rigorously measure:
- amount of talk page discussion generated for each article;
- number of page watchers;
- number of page views (possibly);
- number of arbitration cases or other dispute resolution procedures
related to the article (perhaps a key metric in determining which
articles
are truly most controversial); and
- edit frequency and time between certain edits and partial or full
reverts of those edits.
There are likely a number of other metrics that could be used as well to measure controversiality; these were simply off the top of my head.
Perhaps you are interested in this 2012 paper comparing such metrics, which the authors of the present paper cite to justify their choice of metric: Sepehri Rad, H., Barbosa, D.: Identifying controversial articles in Wikipedia: A comparative study. http://www.wikisym.org/ws2012/p18wikisym2012.pdf
Regarding detection of (partial or full) reverts, see also https://meta.wikimedia.org/wiki/Research:Revert_detection
The second point that stuck out at me was that the study relied on a database dump from March 2010. While this may be unavoidable, being over three years later, this introduces obvious bias into the data and its findings. Put another way, for the English Wikipedia started in 2001,
this
omits a quarter of the project's history(!). Again, given the length of time needed to draft and prepare a study, this gap may very well be unavoidable, but it certainly made me raise an eyebrow.
One final comment I had from briefly reading the study was that in the past few years we've made good strides in making research like this easier. Not that computing cryptographic hashes is particularly
intensive,
but these days we now store such hashes directly in the database (though we store SHA-1 hashes, not MD5 hashes as the study used). Storing these hashes in the database saves researchers the need to compute the hashes themselves and allows MediaWiki and other software the ability to easily and quickly detect full reverts.
MZMcBride
P.S. Noting that this study is still a draft, I happened to notice a
small
typo on page nine: "We tried to a as diverse as possible sample including West European [...]". Hopefully this can be corrected before formal publication.
-- Tilman Bayer Senior Operations Analyst (Movement Communications) Wikimedia Foundation IRC (Freenode): HaeB