[Wikimedia-l] The most controversial topics in Wikipedia: A multilingual and geographical analysis
Balázs Viczián
balazs.viczian at wikimedia.hu
Mon Jul 22 06:26:31 UTC 2013
You may contact them directly with your concerns what I guess many did
after they published their study.
here is their homepage: http://wwm.phy.bme.hu/
Cheers,
Balázs
2013/7/21 MZMcBride <z at mzmcbride.com>
> Anders Wennersten wrote:
> >A most interesting study looking at findings from 10 different language
> >versions.
> >
> >Jesus and Middle east are the most controversial articles seen over the
> >world, but George Bush on en:wp and Chile on es:wp
> >
> >http://arxiv.org/ftp/arxiv/papers/1305/1305.5566.pdf
>
> Thanks for sharing this.
>
> I had a bit of free time last night waiting for trains and I skimmed
> through the study and its findings. Two points stuck out at me: a
> seemingly fatally flawed methodology and the age of data used.
>
> The methodology used in this study seems to be pretty inherently flawed.
> According to the paper, controversiality was measured by full page
> reverts, which are fairly trivial to identify and study in a database dump
> (using cryptographic hashes, as the study did), but I don't think full
> reverts give an accurate impression _at all_ of which articles are the
> most controversial.
>
> Pages with many full reverts are indicative of pages that are heavily
> vandalized. For example, the "George W. Bush" article is/was heavily
> vandalized for years on the English Wikipedia. Does blanking the article
> or replacing its contents with the word "penis" mean that it's a very
> controversial article? Of course not. Measuring only full reverts (as the
> study seems to have done, though it's certainly possible I've overlooked
> something) seems to be really misleading and inaccurate.
>
> In order to measure how controversial an article is, there are a number of
> metrics that could be used, though of course no metric is perfect and many
> metrics can be very difficult to accurately and rigorously measure:
>
> * amount of talk page discussion generated for each article;
> * number of page watchers;
> * number of page views (possibly);
> * number of arbitration cases or other dispute resolution procedures
> related to the article (perhaps a key metric in determining which articles
> are truly most controversial); and
> * edit frequency and time between certain edits and partial or full
> reverts of those edits.
>
> There are likely a number of other metrics that could be used as well to
> measure controversiality; these were simply off the top of my head.
>
> The second point that stuck out at me was that the study relied on a
> database dump from March 2010. While this may be unavoidable, being over
> three years later, this introduces obvious bias into the data and its
> findings. Put another way, for the English Wikipedia started in 2001, this
> omits a quarter of the project's history(!). Again, given the length of
> time needed to draft and prepare a study, this gap may very well be
> unavoidable, but it certainly made me raise an eyebrow.
>
> One final comment I had from briefly reading the study was that in the
> past few years we've made good strides in making research like this
> easier. Not that computing cryptographic hashes is particularly intensive,
> but these days we now store such hashes directly in the database (though
> we store SHA-1 hashes, not MD5 hashes as the study used). Storing these
> hashes in the database saves researchers the need to compute the hashes
> themselves and allows MediaWiki and other software the ability to easily
> and quickly detect full reverts.
>
> MZMcBride
>
> P.S. Noting that this study is still a draft, I happened to notice a small
> typo on page nine: "We tried to a as diverse as possible sample including
> West European [...]". Hopefully this can be corrected before formal
> publication.
>
>
>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request at lists.wikimedia.org?subject=unsubscribe>
>
More information about the Wikimedia-l
mailing list