Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method

27 Jun 2012

Fabian,

I'm confused by your explanation.

How is it possible that this 37% of revisions that are detected as reverts
via a md5 hash are not considered reverts by (I presume) humans?  Can you
give a common example?  By definition, identity revert revisions represent
an exact replica of a previous revision in an article and, therefore,
should discard any intermediate changes.  What definition of "revert" are
you using that the md5 hash method does not satisfy?

-Aaron

On Wed, Jun 27, 2012 at 12:12 PM, Floeck, Fabian (AIFB) <
fabian.floeck(a)kit.edu&gt; wrote:

...
  @Tilman: Thanks, I was not aware of that being in the
NL, didn't read it.
 Excuses everyone for the double posting.

 @Federico: Sorry for not putting it more clearly/ confusing you: So
 1. From the reverts detected by MD5 hash, 37% (actually 37% percent, I
 just looked it up) were not detected by the new method, 63% percent where
 detected by the new method as well.  When we asked people about if these
 37% are a full revert (and requiring 80%+ of people to agree for it to be
 labeled a "true revert") for none of these reverts the crowd agreed (i.e.
 0% accuracy, only goes up if you lower the agreement notably, which means
 you cannot be sure anymore, if it is indeed a revert).
 2. When we looked at the results produced from our method only, (again,
 with the 80% agreement score threshold), about 70% of the found results
 were deemed reverts in comparison.
 3. I just put these numbers in the mail (and the presentation) to
 exemplify the gain of accuracy. They are not in the paper in this form, as
 there, we showed the gain in accuracy just by the statistical significance
 of the differences in the agreements score, which I later realized might
 not be as "tangible" as some accuracy numbers. Turns out it seems to be
 more confusing the way I put it, sorry for that.

 @WereSpielChequers: That could be indeed an interesting direction one
 could look into. Although given the problems of the identity revert method
 we discussed in the paper, I can not yet see how these could be alleviated
  by looking at reverts in the article section-wise. You are certainly right
 to point out that in this specific situation, although there would be not
 necessarily an identical hash for the *whole* article leading to a revert
 detection, there could be an identical/duplicate hash for the subsection,
 leading to an accurate revert detection in that section. Though inside this
 section, the same issues as portrayed in our paper would surface. I will
 look at that period of "Sarah Palin" however to get a better picture of
 that. Thanks a lot for the input.

 Best,

 Fabian

 On Jun 27, 2012, at 8:14 PM, Federico Leva (Nemo) wrote:

 I don't understand: if 35 % of the sample reverts identified by the hash
 method are not considered such by human check and the new system has a 70 %
 accuracy, the difference in false positives is 5 %? I don't understand from
 the paper either.
 The main point seems to be about the more reverts found (as expected),
 right?

 Nemo

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 --
 Karlsruhe Institute of Technology (KIT)
 Institute of Applied Informatics and Formal Description Methods

 Dipl.-Medwiss. Fabian Flöck
 Research Associate

 Building 11.40, Room 222
 KIT-Campus South
 D-76128 Karlsruhe

 Phone: +49 721 608 4 6584
 Skype: f.floeck_work
 E-Mail: fabian.floeck(a)kit.edu
 WWW: http://www.aifb.kit.edu/web/Fabian_Flöck

 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] More accurate revert detection in Wikipedia, alternative to MD5 identical revision method