Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

27 Jun 2012

See also the review in last month's Wikimedia Research Newsletter:
https://meta.wikimedia.org/wiki/Research:Newsletter/2012-05-28#New_algorith…

On Wed, Jun 27, 2012 at 10:05 AM, Floeck, Fabian (AIFB)
&lt;fabian.floeck(a)kit.edu&gt; wrote:
...
  For those of you who are interested in reverts:
 I just presented our paper on accurate revert detection at the ACM Hypertext
 and Social Media conference 2012, showing a significant accuracy (and
 coverage) gain compared to the widely used method of finding identical
 revisions (via MD5 hash values) to detect reverts, proving that our method
 detects edit pairs that are significantly more likely to be actual reverts
 according to editors perception of a revert and the Wikipedia definition.
 35% of the reverts found by the MD5 method in our sample are not assessed to
 be reverts by more than 80% of our survey participants (accuracy 0%). The
 provided new method finds different reverts for these 35% plus 12% more,
 which show a 70% accuracy.

 Find the PDF slides, paper and results here:
 http://people.aifb.kit.edu/ffl/reverts/

 I'll be happy to answer any questions.

 More in detail:
 The MD5 hash method employed by many researchers to identify reverts (as
 some others, like using edit  comments) is acknowledged to produce some
 inaccuracies as far as the Wikipedia definition of a revert ("reverses the
 actions of any editors", "undoing the actions"..) is concerned. The
extent
 of these inaccuracies is usually judged to be not too large, as naturally,
 most reverting edits are carried out immediately after the edit to be
 reverted, being an "identity revert" (Wikipedia definition:
 "..normally results in the page being restored to a version that
 existed previously"). Still, there has not been a user evaluation assessing
 how well the detected reverts conform with the Wikipedia definition and what
 users actually perceive as a revert. We developed and evaluated an
 alternative method to the MD5 identity revert and show a significant
 increase in accuracy (and coverage).
 34% of the reverts detected by the MD5 hash method in our sample actually
 fail to be acknowledged as full reverts by more than 80% of users in our
 study, while our new method performs much better, finding different reverts
 for these 34% wrongly detected reverts plus 12% more reverts, showing an
 accuracy of 70% for these newly found edit pairs actually being reverts
 according to the users. The increased accuracy performance between the
 reverts detected only by the MD5 and only by our new method is highly
 significant, while reverts detected by both methods also perform
 significantly better than those only detected by the MD5 method.

 Trade-off:
 Although this method is much slower than the MD5 method (as it is using
 DIFFs between revisions) it reflects much better what users (and the
 Wikipedia community as a whole) see as a revert. It thereby is a valid
 alternative if you are interested in the antagonistic relationships between
 users on a more detailed and accurate level. There is quite some potential
 to make it even faster by combining the two methods, decreasing the number
 of DIFFs to be performed, let's see if we can come around doing that :)

 The scripts and results listed in the paper can be found at
 http://people.aifb.kit.edu/ffl/reverts/

 Best,

 Fabian

 --
 Karlsruhe Institute of Technology (KIT)
 Institute of Applied Informatics and Formal Description Methods

 Dipl.-Medwiss. Fabian Flöck
 Research Associate

 Building 11.40, Room 222
 KIT-Campus South
 D-76128 Karlsruhe

 Phone: +49 721 608 4 6584
 Skype: f.floeck_work
 E-Mail: fabian.floeck(a)kit.edu
 WWW: http://www.aifb.kit.edu/web/Fabian_Flöck

 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- 
Tilman Bayer
Senior Operations Analyst (Movement Communications)
Wikimedia Foundation
IRC (Freenode): HaeB

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method