I suspected as much. It looks like "false positive" isn't a very good way
to think about the accuracy problem. It looks to me like there are three
states of interest. Assuming a pair of revision_ids representing the
information contained in a "revert": (reverted_id, reverting_id)
1. (reverted_id, reverting_id): The desired case. The reverted revision
appears to have has contributions discarded by the reverting revision
2. (reverted_id, X): Suboptimal, but useful case. The reverted revision
was indeed reverted, but associated reverting revision was not the one to
discard the contributions.
3. (X, X): False positive. The reverted revision was not actually
reverted.
I had thought you were referring to case #3, when you were generally
referring to case #2. Is that right?
-Aaron
On Mon, Jul 2, 2012 at 1:23 PM, Floeck, Fabian (AIFB) <fabian.floeck(a)kit.edu
wrote:
> First of all, thanks a lot for your questions and remarks. (btw Mako: nice
> panel talk yesterday at WPAC12)
>
> tl;dr: scroll all the way down for examples
>
> Questions by Mako:
> 1. Are you limiting this to edits that are separated by an revisions with
> identical hashes by only one edit? --> I'm not quite sure what you mean
> but we do not limit this to specific edits, only exception being: Both
> methods were tested with a limit of going back max. 20 revisions to look
> for reverts.
> 2.And are you sure your human coders aren't just relying on
> edit summaries? --> they couldn't see the edit summaries, via our
> experimental setup.
> 3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is
> only one revision/edit): this is ALWAYS a revert by A targeting B, in both
> methods and was always evaluated as such by the users.
>
>
> Before I give examples, let me just remind you that this is only a sample,
> so first of all, of course it is not inferred statistically that these 37%
> that I mentioned necessarily appear like this in general in the exact same
> way. Secondly, this number is generated when you assume that 80% of all
> participants uttered agreement to an edit pair being a *full* revert,
> i.e. of course there were cases in the sample where people did disagree and
> some cases where even the majority was voting for it to be a full revert
> while being detected by MD5, just not over 79%. I chose this threshold to
> make the differences clear, I could have also selected some other arbitrary
> value. That is exactly why we did not put it in the paper. Because the
> analysis in the paper is a much better ground for making statistical
> inferences about the data that is the "basic population" for this
analysis.
>
> Now, let me give you some examples for false positives generated by the
> MD5 hash method:
>
> 1. One self-generated example (inspired by observations) is given in the
> paper (almost identically):
>
> RevID # RevContent (after edit)# Edit # Hash
> 1 # Peanut # +Peanut # Hash1
> 2 # Peanut Apple # +Apple # Hash2
> 3 # Peanut Apple Banana # +Banana # Hash3
> 4 # Peanut Banana # - Apple # Hash4
> 5 # Peanut # -Banana # Hash1
>
> MD5 assigns 5 as reverting edit of 2,3,4
> DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2
>
> false positive in this case (according to Wikipedia definition) for MD5: 5
> reverting 4 and 2
> (as 4 is unrelated to what 5 is doing and 2's contribution is removed
> already, it can thus not be undone anymore by 5)
>
>
> 2. "Real-life" examples rated as false positives in the user evaluation:
>
> When you asked me for the examples, I started digging them up from the
> data sample that was used and in fact realized that many false positives of
> the MD5 method are related to self-reverts. As this is no issue for our
> data extraction aims (we want to have self-reverts in the results as well)
> and was not considered when just randomly drawing edit pairs from the two
> methods' results, we didn't discuss this in the paper. If you, of course,
> do not consider self-reverts to be reverts in the Wikipedia definition
> sense, they could be filtered out by collapsing subsequent edits of one
> editor before running the revert analysis with the MD5 method. That would
> reduce the number of false positives notably I assume. I will certainly
> look into that.
> If you don't collapse these edits, however (which is not* regularly *done*
> *before reporting/using revert detection results), the number of false
> positives will be quite high, as the edits-to-be-collapsed (and prone to
> being misinterpreted) appear quite often and their span can at times be
> considerably large. And of course there are cases not related to
> self-reverts.
>
> I tried to select examples representative for the sample, which received
> very little or no votes to be full reverts (as detected by MD5):
>
> Example A
> detected as reverted:
http://en.wikipedia.org/w/index.php?&diff=25866415
> detected as reverting:
http://en.wikipedia.org/w/index.php?diff=25866579
>
> detected-as-reverting edit removes only "insomnia" from the
> detected-as-reverted edit, i.e. no full revert, as some insertions from
> previous edits have already been deleted by the reverted editor himself.
> Would be a correct full revert if you collapsed edits by the reverted
> editor to one.
>
>
> Example B
>
> detected as reverted:
http://en.wikipedia.org/w/index.php?diff=196507540
> detected as reverting:
http://en.wikipedia.org/w/index.php?diff=196507775
>
> self-revert of vandalism introduced ("kirsty u tit") before second editor
> reverts--> cannot be reverted by detected-as-reverted edit. Would also be
> remedied by collapsing the first editors edits.
>
>
> Example C
> not related to self-revert, this is an example of incomplete vandalism
> repair, which is then subsequently completed:
>
> detected as reverted:
http://en.wikipedia.org/w/index.php?diff=162097520
> detected as
reverting:http://en.wikipedia.org/w/index.php?diff=162113945
>
>
> Example D
> not related to self-revert
>
> A revert is carried out by TheJazzDalek targeting the edits by
>
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…39>,
> but in the same edit, something is deleted by TheJazzDalek, leading to a
> new unique revisions content. As
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.… in
> the next edit reverts this deletion by TheJazzDalek, but not the initial
> revert of his
(
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…)
> own edits, it is erroneously concluded that
74.131.204.39<http://en.wikipedia.org/wiki/Special:Contributions/74.131.…
reverts
> himself, which is not the case.
>
> detected as reverted:
http://en.wikipedia.org/w/index.php?diff=292533562
> detected as reverting:
http://en.wikipedia.org/w/index.php?diff=292760323
>
>
>
> Example E
>
> detected as reverted:
http://en.wikipedia.org/w/index.php?diff=231824943
> detected as reverting:
http://en.wikipedia.org/w/index.php?diff=231960286
>
> First, reverting editor (Laser brain) undoes (not rolling back to/ not
> creating duplicate revision) some edits by another editor before deleting
> the result of
67.162.68.255<http://en.wikipedia.org/wiki/Special:Contributions/67.162.…
's
> edits (one of which was detected here as reverted). The "detected as
> reverted" revision is partly self-reverted by
67.162.68.<http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.…
> 255 <http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255> . The
> other part, a date change in an "accessdate=" is not "undone" as
such, but
> the whole "accessdate=" part is deleted (stemming from a third editor).
>
>
> Example F
>
> Here, between the "reverted" and the "reverting" one, there
happens a
> mixture of self-reverts, reverts and different vandalism forms,:
>
> detected as
reverted:http://en.wikipedia.org/w/index.php?diff=131372047
> detected as
reverting:http://en.wikipedia.org/w/index.php?diff=131658207
>
>
> If I now failed to answer any of your questions please excuse me and ask
> me again.
>
>
> Best,
>
> Fabian
>
>
> On Jun 29, 2012, at 9:53 PM, Shilad Sen wrote:
>
> One example I've seen of md5's failing is in common short vandalism
> phrases. The most obvious of these is removing all text, or single vulgar
> words.
>
> Here's a scenario:
>
> - User A vandalizes an article by replacing it with "poop."
> - User B restores the article.
> - Some time passes....
> - User C vandalizes an article by replacing it with "poop."
>
> User C isn't really reverting B's edit. You may be able to guess that
> since A was reverted, C must not be reverting, but this logic can be tricky
> and wrong.
>
> In practice, I've been able to catch most of these instances by a)
> ignoring md5 reverts outside a certain window of revisions and b) ignoring
> md5 reverts whose replace lots of text with very little text.
>
> Also, this situation occurs regularly, but nowhere near 37%.
>
> -Shilad
>
> On Fri, Jun 29, 2012 at 11:15 AM, Benj. Mako Hill <mako(a)mit.edu
wrote:
>
>> <quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at
04:39:30PM -0700">
>> > I'm confused by your explanation.
>> >
>> > How is it possible that this 37% of revisions that are detected as
>> reverts
>> > via a md5 hash are not considered reverts by (I presume) humans? Can
>> you
>> > give a common example? By definition, identity revert revisions
>> represent
>> > an exact replica of a previous revision in an article and, therefore,
>> > should discard any intermediate changes. What definition of
"revert"
>> are
>> > you using that the md5 hash method does not satisfy?
>>
>> Also, I can't tell from either the paper or the conversation here: Are
>> Are you limiting this to edits that are separated by an revisions with
>> identical hashes by only one edit? When you do that, things become a
>> bit more complicated.
>>
>> And are you sure your human coders aren't just relying on edit
>> summaries? Like Aaron, I'm having a hard time imagining a situation
>> where revisions go HASH-A => HASH-B => HASH-A that shouldn't be
>> treated like a revert and think tend to think this sounds more like
>> fallible than broken tools. If the user doesn't *know* or think they
>> are reverting an edit, it seems wrong to *not* to call that a revert.
>>
>> Later,
>> Mako
>>
>>
>> --
>> Benjamin Mako Hill
>> mako(a)mit.edu
>>
http://mako.cc/
>>
>> Creativity can be a social contribution, but only in so far
>> as society is free to use the results. --GNU Manifesto
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
>
> --
> Shilad W. Sen
> Assistant Professor
> Mathematics, Statistics, and Computer Science Dept.
> Macalester College
> ssen(a)macalester.edu
> 651-696-6273
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
>
>
>
> --
>
> Karlsruhe Institute of Technology (KIT)
> Institute of Applied Informatics and Formal Description Methods
>
> Dipl.-Medwiss. Fabian Flöck
> Research Associate
>
> Building 11.40, Room 222
> KIT-Campus South
> D-76128 Karlsruhe
>
> Phone: +49 721 608 4 6584
> Skype: f.floeck_work
> E-Mail: fabian.floeck(a)kit.edu
> WWW:
http://www.aifb.kit.edu/web/Fabian_Flöck
>
> KIT – University of the State of Baden-Wuerttemberg and
> National Research Center of the Helmholtz Association
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>