Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

22 Jul 2013

I see the difference on the different version as most interesting and to 
have some insight into Arabic version, I have not had before

On a "small version" like sv:wp we are very used to "steal with pride"

content from other versions, primary en:wp but also de:wp and others and 
we do this especially for controversial subjects that are not specific 
for a country/culture. But are en:wp and other big versions doing the 
same? It is very refreshing for a clinched discussion to start with an 
almost all new textversion.

Also I wonder over articles like Homeopathy 
http://en.wikipedia.org/wiki/Homeopathy which seems to be in top of 
controversies. Would it be an idea to compile an unverisal article with 
help from different versions, ie do we really utilize the power of us 
having many versions and many experts?

Anders

Osmar Valdebenito skrev 2013-07-22 16:13:
...
  I was interviewed a few days ago from a Chilean
newspaper because of this
 paper. For those interested that can read Spanish here is the full article:

http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-di…

 I read the paper in full and I have to admit it has very interesting
 approaches to remove the "vandalism" effect. Probably it won't be perfect,
 especially for a platform where it is impossible to have an exact,
 quantitative measure of quality or neutrality. Is there a measure of
 controversiality? I will consider controversial those articles where I
 usually edit and probably I will ignore several others that are more
 controversial and so on...

 But besides the particular issue of which is the most controversial
 article, I'm more interested in the trends that each Wikipedia has. They
 seem consistent and I think there is a lot of things that we can learn from
 it.

 *Osmar Valdebenito G.*
 Director Ejecutivo
 A. C. Wikimedia Argentina

 2013/7/22 Taha Yasseri &lt;taha.yasseri(a)oii.ox.ac.uk&gt;

  Thanks Tilman.

 Especially for your effort to resolve the misunderstandings, which most of
 them I suppose are due to a shallow reading: "I had a bit of free time last
 night waiting for trains and I skimmed  through the study and its
 findings."

 We had two strategies to get rid of vandalisms, as you mentioned,
 considering only mutual reverts and waiting editors by their maturity, I
 suppose a vandal could not have a large maturity score by definition.

 As for the data, this study has been carried out in 2011, and we worked on
 the latest available dump at the time. Someone experienced in academic
 research, especially at this scale well knows that it really takes time to
 get the analysis done, write the reports, get them reviewed, etc.
 Especially that we have published 7-8 other papers during the same period.
 I see no problem in this as long as the metadata and such information about
 the methods and the data under study are mentioned in the manuscript, which
 is clearly the case here. I have seen many Wikipedia studies without any
 mention of the dump they have used!

   Back to your concern for the general impression that the news media give
 on wikipedia being a battlefield, I'd like to mention that I have
 emphasised the small number of controversial articles compare to the total
 number of articles in every single media response I had. Again as you
 mentioned, we had given the percentages explicitly in our previous work.
 But of course for obvious reasons journalists are not happy to highlight
 this. They like to report on controversies and wars! This is not our fault
 that what they report could be misleading, as long as we had tried our best
 to avoid it. An interview of mine with  BBC Radio Scotland: at 04:00 I
 clearly say that there are millions and thousands of articles in WIkipedia
 which are not controversial, is available here:
 https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3 . I have
 done the same in all the others.

 Finally, I wish that the public media coverage of our research which is
 clearly far from perfect, could also provide the members of the public a
 better understanding of how Wikipedia works and how fascinating it is!

 Thanks again,

 Taha

 On 22 Jul 2013 05:58, "Tilman Bayer" &lt;tbayer(a)wikimedia.org&gt; wrote:

  On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride
&lt;z(a)mzmcbride.com&gt; wrote:
  Anders Wennersten wrote:
> A most interesting study looking at findings from 10 different language
> versions.
>
> Jesus and Middle east are the most controversial articles seen over the
> world, but George Bush on en:wp and Chile on es:wp
>
> http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf  FWIW, here is the review
by Giovanni Luca Ciampaglia in last month's
 Wikimedia Research Newsletter:

https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-20…
  (also published in the Signpost, the weekly
newsletter on the English
 Wikipedia)

> Thanks for sharing this.
>
> I had a bit of free time last night waiting for trains and I skimmed
> through the study and its findings. Two points stuck out at me: a
> seemingly fatally flawed methodology and the age of data used.
>
> The methodology used in this study seems to be pretty inherently  flawed.
   According
to the paper, controversiality was measured by full page
 reverts, which are fairly trivial to identify and study in a database  dump
> (using cryptographic hashes, as the study did), but I don't think full
> reverts give an accurate impression _at all_ of which articles are the
> most controversial.
>
> Pages with many full reverts are indicative of pages that are heavily
> vandalized. For example, the "George W. Bush" article is/was heavily
> vandalized for years on the English Wikipedia. Does blanking the  article
 > or replacing its contents with the word
"penis" mean that it's a very
> controversial article? Of course not. Measuring only full reverts (as  the
 > study seems to have done, though it's
certainly possible I've  overlooked

something) seems to be really misleading and inaccurate.  They didn't. You may
have overlooked the description of the
 methodology on p.5: It's based on "mutual reverts" where user A has
 reverted user B and user B has reverted user A, and gives higher
 weight to disputes between more experienced editors. This should
 exclude most vandalism reverts of the sort you describe. As noted in
 Giovanni's review, this method was proposed in an earlier paper, Sumi
 et al. (

https://meta.wikimedia.org/wiki/Research:Newsletter/2011/July#Edit_wars_and…
  ). That paper explains at length how this metric
serves to distinguish
 vandalism reverts from edit wars. Of course there are ample
 possibilities to refine it, e.g. taking into account page protection
 logs.

 Personally, I'm more concerned that the new paper totally fails to put
 its subject into perspective by stating how frequent such
 controversial articles are overall on Wikipedia. Thus it's no wonder
 that the ample international media coverage that it generated mostly
 transports the notion (or reinforces the preconception) of Wikipedia
 as a huge battleground.

 The 2011 Sumi et al. paper did a better job in that respect: "less
 than 25k articles, i.e. less than 1% of the 3m articles available in
 the November 2009 English WP dump, can be called controversial, and of
 these, less than half are truly edit wars."

  In order to measure how controversial an article
is, there are a number  of
  metrics that could be used, though of course no
metric is perfect and  many
  metrics can be very difficult to accurately and
rigorously measure:

 * amount of talk page discussion generated for each article;
 * number of page watchers;
 * number of page views (possibly);
 * number of arbitration cases or other dispute resolution procedures
 related to the article (perhaps a key metric in determining which  articles
> are truly most controversial); and
> * edit frequency and time between certain edits and partial or full
> reverts of those edits.
>
> There are likely a number of other metrics that could be used as well  to
   measure
controversiality; these were simply off the top of my head.  Perhaps you are
interested in this 2012 paper comparing such metrics,
 which the authors of the present paper cite to justify their choice of
 metric:
 Sepehri Rad, H., Barbosa, D.: Identifying controversial articles in
 Wikipedia: A comparative study.
 http://www.wikisym.org/ws2012/p18wikisym2012.pdf

 Regarding detection of (partial or full) reverts, see also
 https://meta.wikimedia.org/wiki/Research:Revert_detection

> The second point that stuck out at me was that the study relied on a
> database dump from March 2010. While this may be unavoidable, being  over
   three
years later, this introduces obvious bias into the data and its
 findings. Put another way, for the English Wikipedia started in 2001,  this
  omits a quarter of the project's history(!).
Again, given the length of
 time needed to draft and prepare a study, this gap may very well be
 unavoidable, but it certainly made me raise an eyebrow.

 One final comment I had from briefly reading the study was that in the
 past few years we've made good strides in making research like this
 easier. Not that computing cryptographic hashes is particularly  intensive,
> but these days we now store such hashes directly in the database  (though
 > we store SHA-1 hashes, not MD5 hashes as the
study used). Storing these
> hashes in the database saves researchers the need to compute the hashes
> themselves and allows MediaWiki and other software the ability to  easily
   and
quickly detect full reverts.

 MZMcBride

 P.S. Noting that this study is still a draft, I happened to notice a  small
> typo on page nine: "We tried to a as diverse as possible sample 
including
   West
European [...]". Hopefully this can be corrected before formal
 publication.

 --
 Tilman Bayer
 Senior Operations Analyst (Movement Communications)
 Wikimedia Foundation
 IRC (Freenode): HaeB

 --
 Dr Taha Yasseri
 http://www.oii.ox.ac.uk/people/yasseri/
 Oxford Internet Institute
 University of Oxford
 1 St.Giles
 Oxford OX1 3JS
 Tel.01865-287229
 -------------------------------------------
 Latest Article: Phys. Rev. Lett. Opinions, Conflicts, and Consensus:
 Modeling Social Dynamics in a Collaborative
 Environment<http://prl.aps.org/abstract/PRL/v110/i8/e088701>

 Non-technical review: University of Oxford, Mathematical model 'describes'
 how online conflicts are
 resolved<http://www.ox.ac.uk/media/news_stories/2013/130220.html>
 _______________________________________________
 Wikimedia-l mailing list
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
  _______________________________________________
 Wikimedia-l mailing list
 Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis