[Wikimedia-l] Fwd: The most controversial topics in Wikipedia: A multilingual and geographical analysis

Balázs Viczián balazs.viczian at wikimedia.hu
Tue Jul 23 06:42:18 UTC 2013


When I started editing in 2006 it was already the norm; ever since people
are encouraging each other to place their questions about a given article
rather on the village pump or a project page, than on the actual article's
talk page, reasoning that there is larger traffic....what generates even
larger traffic on those pages making article talks even more sparse :)

I guess only a socio-cultural research could answer the question: why is it
like that on huwiki. Maybe one day in the bright (and hopefully not so far)
future Wikimedia Hungary will order a similar research so you can use that
later on in your own research ;)

Üdv,
Balázs


2013/7/22 Taha Yasseri <taha.yasseri at oii.ox.ac.uk>

> That's very interesting to know. Thanks for telling me. We were quite
> surprised by seeing very spars talk pages in Hungarian Wiki.
> I'm sure you know better than me that article talk pages are for different
> purposes that user talks and the village pump. However that's interesting
> that Hungarian Wikipedia prefer to take the discussion to other places than
> talk pages.
>
> szervusz
> Taha.
>
> On Mon, Jul 22, 2013 at 9:32 PM, Balázs Viczián <
> balazs.viczian at wikimedia.hu
> > wrote:
>
> > As a Hungarian, it is really interesting to read something specific
> > about the Hungarian Wikipedia :)
> >
> > I read somewhere (correct me if I'm wrong) that you found little to no
> > discussions on article talk pages on the Hungarian Wikipedia,
> > indicating that users barely discuss the content (or anything at all
> > about the given article).
> >
> > Actually these discussions are either quickly moving to the village
> > pump after 1-2 comments or happening there entirely. The most common
> > is that the users discuss it on their user talk pages by directly
> > messaging each other about the changes they made/content, creating
> > 2-3-4 paralel threads on each others's user talks. Article talks for
> > this reason are generally considered "deserted lands" on huwiki, what
> > almost nobody reads.
> >
> > Cheers,
> > Balázs
> >
> > 2013/7/22 Taha Yasseri <taha.yasseri at oii.ox.ac.uk>
> > >
> > > Anders,
> > > I really like your idea on "universal" articles. given the fact that
> > > translation and communication cross languages is not a very task these
> > days
> > > any more.
> > >
> > > By the way, in a blog post, I have release some more data on languages
> > like
> > > Japanese, Chinese, and Portugies, in case anyone's interested:
> > >
> >
> http://tahayasseri.wordpress.com/2013/05/27/wikipedia-modern-platform-ancient-debates-on-land-and-gods/
> > >
> > > bests,
> > > Taha
> > >
> > >
> > > On Mon, Jul 22, 2013 at 4:17 PM, Anders Wennersten <
> > mail at anderswennersten.se
> > > > wrote:
> > >
> > > > I see the difference on the different version as most interesting and
> > to
> > > > have some insight into Arabic version, I have not had before
> > > >
> > > > On a "small version" like sv:wp we are very used to "steal with
> pride"
> > > > content from other versions, primary en:wp but also de:wp and others
> > and we
> > > > do this especially for controversial subjects that are not specific
> > for a
> > > > country/culture. But are en:wp and other big versions doing the same?
> > It is
> > > > very refreshing for a clinched discussion to start with an almost all
> > new
> > > > textversion.
> > > >
> > > > Also I wonder over articles like Homeopathy
> > http://en.wikipedia.org/wiki/*
> > > > *Homeopathy <http://en.wikipedia.org/wiki/Homeopathy> which seems to
> > be
> > > > in top of controversies. Would it be an idea to compile an unverisal
> > > > article with help from different versions, ie do we really utilize
> the
> > > > power of us having many versions and many experts?
> > > >
> > > > Anders
> > > >
> > > >
> > > >
> > > > Osmar Valdebenito skrev 2013-07-22 16:13:
> > > >
> > > >  I was interviewed a few days ago from a Chilean newspaper because of
> > this
> > > >> paper. For those interested that can read Spanish here is the full
> > > >> article:
> > > >> http://www.latercera.com/**noticia/tendencias/2013/07/**
> > > >> 659-533645-9-estudio-dice-que-**chile-es-el-articulo-de-**
> > > >> wikipedia-mas-editado-en-**espanol.shtml<
> >
> http://www.latercera.com/noticia/tendencias/2013/07/659-533645-9-estudio-dice-que-chile-es-el-articulo-de-wikipedia-mas-editado-en-espanol.shtml
> > >
> > > >>
> > > >> I read the paper in full and I have to admit it has very interesting
> > > >> approaches to remove the "vandalism" effect. Probably it won't be
> > perfect,
> > > >> especially for a platform where it is impossible to have an exact,
> > > >> quantitative measure of quality or neutrality. Is there a measure of
> > > >> controversiality? I will consider controversial those articles
> where I
> > > >> usually edit and probably I will ignore several others that are more
> > > >> controversial and so on...
> > > >>
> > > >> But besides the particular issue of which is the most controversial
> > > >> article, I'm more interested in the trends that each Wikipedia has.
> > They
> > > >> seem consistent and I think there is a lot of things that we can
> learn
> > > >> from
> > > >> it.
> > > >>
> > > >> *Osmar Valdebenito G.*
> > > >> Director Ejecutivo
> > > >> A. C. Wikimedia Argentina
> > > >>
> > > >>
> > > >> 2013/7/22 Taha Yasseri <taha.yasseri at oii.ox.ac.uk>
> > > >>
> > > >>  Thanks Tilman.
> > > >>>
> > > >>> Especially for your effort to resolve the misunderstandings, which
> > most
> > > >>> of
> > > >>> them I suppose are due to a shallow reading: "I had a bit of free
> > time
> > > >>> last
> > > >>> night waiting for trains and I skimmed  through the study and its
> > > >>> findings."
> > > >>>
> > > >>> We had two strategies to get rid of vandalisms, as you mentioned,
> > > >>> considering only mutual reverts and waiting editors by their
> > maturity, I
> > > >>> suppose a vandal could not have a large maturity score by
> definition.
> > > >>>
> > > >>> As for the data, this study has been carried out in 2011, and we
> > worked
> > > >>> on
> > > >>> the latest available dump at the time. Someone experienced in
> > academic
> > > >>> research, especially at this scale well knows that it really takes
> > time
> > > >>> to
> > > >>> get the analysis done, write the reports, get them reviewed, etc.
> > > >>> Especially that we have published 7-8 other papers during the same
> > > >>> period.
> > > >>> I see no problem in this as long as the metadata and such
> information
> > > >>> about
> > > >>> the methods and the data under study are mentioned in the
> manuscript,
> > > >>> which
> > > >>> is clearly the case here. I have seen many Wikipedia studies
> without
> > any
> > > >>> mention of the dump they have used!
> > > >>>
> > > >>>   Back to your concern for the general impression that the news
> media
> > > >>> give
> > > >>> on wikipedia being a battlefield, I'd like to mention that I have
> > > >>> emphasised the small number of controversial articles compare to
> the
> > > >>> total
> > > >>> number of articles in every single media response I had. Again as
> you
> > > >>> mentioned, we had given the percentages explicitly in our previous
> > work.
> > > >>> But of course for obvious reasons journalists are not happy to
> > highlight
> > > >>> this. They like to report on controversies and wars! This is not
> our
> > > >>> fault
> > > >>> that what they report could be misleading, as long as we had tried
> > our
> > > >>> best
> > > >>> to avoid it. An interview of mine with  BBC Radio Scotland: at
> 04:00
> > I
> > > >>> clearly say that there are millions and thousands of articles in
> > > >>> WIkipedia
> > > >>> which are not controversial, is available here:
> > > >>>
> https://www.dropbox.com/s/**8whovkmipbqdzlv/bbc_radio_**Scotland.mp3
> > <https://www.dropbox.com/s/8whovkmipbqdzlv/bbc_radio_Scotland.mp3>. I
> have
> > > >>> done the same in all the others.
> > > >>>
> > > >>> Finally, I wish that the public media coverage of our research
> which
> > is
> > > >>> clearly far from perfect, could also provide the members of the
> > public a
> > > >>> better understanding of how Wikipedia works and how fascinating it
> > is!
> > > >>>
> > > >>> Thanks again,
> > > >>>
> > > >>> Taha
> > > >>>
> > > >>>
> > > >>> On 22 Jul 2013 05:58, "Tilman Bayer" <tbayer at wikimedia.org> wrote:
> > > >>>
> > > >>>  On Sun, Jul 21, 2013 at 2:32 PM, MZMcBride <z at mzmcbride.com>
> wrote:
> > > >>>>
> > > >>>>> Anders Wennersten wrote:
> > > >>>>>
> > > >>>>>> A most interesting study looking at findings from 10 different
> > > >>>>>> language
> > > >>>>>> versions.
> > > >>>>>>
> > > >>>>>> Jesus and Middle east are the most controversial articles seen
> > over
> > > >>>>>> the
> > > >>>>>> world, but George Bush on en:wp and Chile on es:wp
> > > >>>>>>
> > > >>>>>> http://arxiv.org/ftp/arxiv/**papers/1305/1305.5566.pdf<
> > http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf>
> > > >>>>>>
> > > >>>>> FWIW, here is the review by Giovanni Luca Ciampaglia in last
> > month's
> > > >>>> Wikimedia Research Newsletter:
> > > >>>>
> > > >>>>
> > > >>>>  https://blog.wikimedia.org/**2013/06/28/wikimedia-research-**
> > > >>> newsletter-june-2013/#.22The_**most_controversial_topics_in_**
> > > >>> Wikipedia:_a_multilingual_and_**geographical_analysis.22<
> >
> https://blog.wikimedia.org/2013/06/28/wikimedia-research-newsletter-june-2013/#.22The_most_controversial_topics_in_Wikipedia:_a_multilingual_and_geographical_analysis.22
> > >
> > > >>>
> > > >>>> (also published in the Signpost, the weekly newsletter on the
> > English
> > > >>>> Wikipedia)
> > > >>>>
> > > >>>>  Thanks for sharing this.
> > > >>>>>
> > > >>>>> I had a bit of free time last night waiting for trains and I
> > skimmed
> > > >>>>> through the study and its findings. Two points stuck out at me: a
> > > >>>>> seemingly fatally flawed methodology and the age of data used.
> > > >>>>>
> > > >>>>> The methodology used in this study seems to be pretty inherently
> > > >>>>>
> > > >>>> flawed.
> > > >>>
> > > >>>> According to the paper, controversiality was measured by full page
> > > >>>>> reverts, which are fairly trivial to identify and study in a
> > database
> > > >>>>>
> > > >>>> dump
> > > >>>>
> > > >>>>> (using cryptographic hashes, as the study did), but I don't think
> > full
> > > >>>>> reverts give an accurate impression _at all_ of which articles
> are
> > the
> > > >>>>> most controversial.
> > > >>>>>
> > > >>>>> Pages with many full reverts are indicative of pages that are
> > heavily
> > > >>>>> vandalized. For example, the "George W. Bush" article is/was
> > heavily
> > > >>>>> vandalized for years on the English Wikipedia. Does blanking the
> > > >>>>>
> > > >>>> article
> > > >>>
> > > >>>> or replacing its contents with the word "penis" mean that it's a
> > very
> > > >>>>> controversial article? Of course not. Measuring only full reverts
> > (as
> > > >>>>>
> > > >>>> the
> > > >>>
> > > >>>> study seems to have done, though it's certainly possible I've
> > > >>>>>
> > > >>>> overlooked
> > > >>>
> > > >>>> something) seems to be really misleading and inaccurate.
> > > >>>>>
> > > >>>> They didn't. You may have overlooked the description of the
> > > >>>> methodology on p.5: It's based on "mutual reverts" where user A
> has
> > > >>>> reverted user B and user B has reverted user A, and gives higher
> > > >>>> weight to disputes between more experienced editors. This should
> > > >>>> exclude most vandalism reverts of the sort you describe. As noted
> in
> > > >>>> Giovanni's review, this method was proposed in an earlier paper,
> > Sumi
> > > >>>> et al. (
> > > >>>>
> > > >>>>  https://meta.wikimedia.org/**wiki/Research:Newsletter/2011/**
> > > >>> July#Edit_wars_and_conflict_**metrics<
> >
> https://meta.wikimedia.org/wiki/Research:Newsletter/2011/July#Edit_wars_and_conflict_metrics
> > >
> > > >>>
> > > >>>> ). That paper explains at length how this metric serves to
> > distinguish
> > > >>>> vandalism reverts from edit wars. Of course there are ample
> > > >>>> possibilities to refine it, e.g. taking into account page
> protection
> > > >>>> logs.
> > > >>>>
> > > >>>> Personally, I'm more concerned that the new paper totally fails to
> > put
> > > >>>> its subject into perspective by stating how frequent such
> > > >>>> controversial articles are overall on Wikipedia. Thus it's no
> wonder
> > > >>>> that the ample international media coverage that it generated
> mostly
> > > >>>> transports the notion (or reinforces the preconception) of
> Wikipedia
> > > >>>> as a huge battleground.
> > > >>>>
> > > >>>> The 2011 Sumi et al. paper did a better job in that respect: "less
> > > >>>> than 25k articles, i.e. less than 1% of the 3m articles available
> in
> > > >>>> the November 2009 English WP dump, can be called controversial,
> and
> > of
> > > >>>> these, less than half are truly edit wars."
> > > >>>>
> > > >>>>
> > > >>>>  In order to measure how controversial an article is, there are a
> > number
> > > >>>>>
> > > >>>> of
> > > >>>>
> > > >>>>> metrics that could be used, though of course no metric is perfect
> > and
> > > >>>>>
> > > >>>> many
> > > >>>>
> > > >>>>> metrics can be very difficult to accurately and rigorously
> measure:
> > > >>>>>
> > > >>>>> * amount of talk page discussion generated for each article;
> > > >>>>> * number of page watchers;
> > > >>>>> * number of page views (possibly);
> > > >>>>> * number of arbitration cases or other dispute resolution
> > procedures
> > > >>>>> related to the article (perhaps a key metric in determining which
> > > >>>>>
> > > >>>> articles
> > > >>>>
> > > >>>>> are truly most controversial); and
> > > >>>>> * edit frequency and time between certain edits and partial or
> full
> > > >>>>> reverts of those edits.
> > > >>>>>
> > > >>>>> There are likely a number of other metrics that could be used as
> > well
> > > >>>>>
> > > >>>> to
> > > >>>
> > > >>>> measure controversiality; these were simply off the top of my
> head.
> > > >>>>>
> > > >>>> Perhaps you are interested in this 2012 paper comparing such
> > metrics,
> > > >>>> which the authors of the present paper cite to justify their
> choice
> > of
> > > >>>> metric:
> > > >>>> Sepehri Rad, H., Barbosa, D.: Identifying controversial articles
> in
> > > >>>> Wikipedia: A comparative study.
> > > >>>> http://www.wikisym.org/ws2012/**p18wikisym2012.pdf<
> > http://www.wikisym.org/ws2012/p18wikisym2012.pdf>
> > > >>>>
> > > >>>> Regarding detection of (partial or full) reverts, see also
> > > >>>> https://meta.wikimedia.org/**wiki/Research:Revert_detection<
> > https://meta.wikimedia.org/wiki/Research:Revert_detection>
> > > >>>>
> > > >>>>  The second point that stuck out at me was that the study relied
> on
> > a
> > > >>>>> database dump from March 2010. While this may be unavoidable,
> being
> > > >>>>>
> > > >>>> over
> > > >>>
> > > >>>> three years later, this introduces obvious bias into the data and
> > its
> > > >>>>> findings. Put another way, for the English Wikipedia started in
> > 2001,
> > > >>>>>
> > > >>>> this
> > > >>>>
> > > >>>>> omits a quarter of the project's history(!). Again, given the
> > length of
> > > >>>>> time needed to draft and prepare a study, this gap may very well
> be
> > > >>>>> unavoidable, but it certainly made me raise an eyebrow.
> > > >>>>>
> > > >>>>> One final comment I had from briefly reading the study was that
> in
> > the
> > > >>>>> past few years we've made good strides in making research like
> this
> > > >>>>> easier. Not that computing cryptographic hashes is particularly
> > > >>>>>
> > > >>>> intensive,
> > > >>>>
> > > >>>>> but these days we now store such hashes directly in the database
> > > >>>>>
> > > >>>> (though
> > > >>>
> > > >>>> we store SHA-1 hashes, not MD5 hashes as the study used). Storing
> > these
> > > >>>>> hashes in the database saves researchers the need to compute the
> > hashes
> > > >>>>> themselves and allows MediaWiki and other software the ability to
> > > >>>>>
> > > >>>> easily
> > > >>>
> > > >>>> and quickly detect full reverts.
> > > >>>>>
> > > >>>>> MZMcBride
> > > >>>>>
> > > >>>>> P.S. Noting that this study is still a draft, I happened to
> notice
> > a
> > > >>>>>
> > > >>>> small
> > > >>>>
> > > >>>>> typo on page nine: "We tried to a as diverse as possible sample
> > > >>>>>
> > > >>>> including
> > > >>>
> > > >>>> West European [...]". Hopefully this can be corrected before
> formal
> > > >>>>> publication.
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>> --
> > > >>>> Tilman Bayer
> > > >>>> Senior Operations Analyst (Movement Communications)
> > > >>>> Wikimedia Foundation
> > > >>>> IRC (Freenode): HaeB
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> --
> > > >>> Dr Taha Yasseri
> > > >>> http://www.oii.ox.ac.uk/**people/yasseri/<
> > http://www.oii.ox.ac.uk/people/yasseri/>
> > > >>> Oxford Internet Institute
> > > >>> University of Oxford
> > > >>> 1 St.Giles
> > > >>> Oxford OX1 3JS
> > > >>> Tel.01865-287229
> > > >>> ------------------------------**-------------
> > > >>> Latest Article: Phys. Rev. Lett. Opinions, Conflicts, and
> Consensus:
> > > >>> Modeling Social Dynamics in a Collaborative
> > > >>> Environment<http://prl.aps.**org/abstract/PRL/v110/i8/**e088701<
> > http://prl.aps.org/abstract/PRL/v110/i8/e088701>
> > > >>> >
> > > >>>
> > > >>> Non-technical review: University of Oxford, Mathematical model
> > > >>> 'describes'
> > > >>> how online conflicts are
> > > >>> resolved<
> http://www.ox.ac.uk/**media/news_stories/2013/**130220.html
> > <http://www.ox.ac.uk/media/news_stories/2013/130220.html>
> > > >>> >
> > > >>> ______________________________**_________________
> > > >>> Wikimedia-l mailing list
> > > >>> Wikimedia-l at lists.wikimedia.**org <Wikimedia-l at lists.wikimedia.org
> >
> > > >>> Unsubscribe:
> > https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l>
> > > >>> ,
> > > >>> <mailto:wikimedia-l-request@**lists.wikimedia.org<
> > wikimedia-l-request at lists.wikimedia.org>
> > > >>> ?subject=**unsubscribe>
> > > >>>
> > > >>>  ______________________________**_________________
> > > >> Wikimedia-l mailing list
> > > >> Wikimedia-l at lists.wikimedia.**org <Wikimedia-l at lists.wikimedia.org>
> > > >> Unsubscribe:
> > https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l>,
> > > >> <mailto:wikimedia-l-request@**lists.wikimedia.org<
> > wikimedia-l-request at lists.wikimedia.org>
> > > >> ?subject=**unsubscribe>
> > > >>
> > > >
> > > >
> > > > ______________________________**_________________
> > > > Wikimedia-l mailing list
> > > > Wikimedia-l at lists.wikimedia.**org <Wikimedia-l at lists.wikimedia.org>
> > > > Unsubscribe:
> > https://lists.wikimedia.org/**mailman/listinfo/wikimedia-l<
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l>,
> > > > <mailto:wikimedia-l-request@**lists.wikimedia.org<
> > wikimedia-l-request at lists.wikimedia.org>
> > > > ?subject=**unsubscribe>
> > > >
> > >
> > >
> > >
> > > --
> > > Dr Taha Yasseri
> > > http://www.oii.ox.ac.uk/people/yasseri/
> > > Oxford Internet Institute
> > > University of Oxford
> > > 1 St.Giles
> > > Oxford OX1 3JS
> > > Tel.01865-287229
> > > -------------------------------------------
> > > Latest Article: Phys. Rev. Lett. Opinions, Conflicts, and Consensus:
> > > Modeling Social Dynamics in a Collaborative
> > > Environment<http://prl.aps.org/abstract/PRL/v110/i8/e088701>
> > >
> > > Non-technical review: University of Oxford, Mathematical model
> > 'describes'
> > > how online conflicts are
> > > resolved<http://www.ox.ac.uk/media/news_stories/2013/130220.html>
> > > _______________________________________________
> > > Wikimedia-l mailing list
> > > Wikimedia-l at lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>
> >
> > _______________________________________________
> > Wikimedia-l mailing list
> > Wikimedia-l at lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>
> >
>
>
>
> --
> Dr Taha Yasseri
> http://www.oii.ox.ac.uk/people/yasseri/
> Oxford Internet Institute
> University of Oxford
> 1 St.Giles
> Oxford OX1 3JS
> Tel.01865-287229
> -------------------------------------------
> Latest Article: Phys. Rev. Lett. Opinions, Conflicts, and Consensus:
> Modeling Social Dynamics in a Collaborative
> Environment<http://prl.aps.org/abstract/PRL/v110/i8/e088701>
>
> Non-technical review: University of Oxford, Mathematical model 'describes'
> how online conflicts are
> resolved<http://www.ox.ac.uk/media/news_stories/2013/130220.html>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>
>


More information about the Wikimedia-l mailing list