First of all, thanks a lot for your questions and remarks. (btw Mako: nice panel talk yesterday at WPAC12)

tl;dr: scroll all the way down for examples

Questions by Mako:
1. Are you limiting this to edits that are separated by an revisions with
identical hashes by only one edit? --> I'm not quite sure what you mean but we do not limit this to specific edits, only exception being: Both methods were tested with a limit of going back max. 20 revisions to look for reverts.
2.And are you sure your human coders aren't just relying on edit summaries? --> they couldn't see the edit summaries, via our experimental setup.
3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is only one revision/edit): this is ALWAYS a revert by A targeting B, in both methods and was always evaluated as such by the users.


Before I give examples, let me just remind you that this is only a sample, so first of all, of course it is not inferred statistically that these 37% that I mentioned necessarily appear like this in general in the exact same way. Secondly, this number is generated when you assume that 80% of all participants uttered agreement to an edit pair being a full revert, i.e. of course there were cases in the sample where people did disagree and some cases where even the majority was voting for it to be a full revert while being detected by MD5, just not over 79%.  I chose this threshold to make the differences clear, I could have also selected some other arbitrary value. That is exactly why we did not put it in the paper. Because the analysis in the paper is a much better ground for making statistical inferences about the data that is the "basic population" for this analysis. 

Now, let me give you some examples for false positives generated by the MD5 hash method: 

1. One self-generated example (inspired by observations) is given in the paper (almost identically): 

RevID # RevContent (after edit)# Edit # Hash
1 # Peanut # +Peanut # Hash1 
2 # Peanut Apple # +Apple # Hash2
3 # Peanut Apple Banana # +Banana # Hash3
4 # Peanut Banana # - Apple # Hash4
5 # Peanut # -Banana # Hash1

MD5 assigns 5 as reverting edit of 2,3,4
DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2

false positive in this case (according to Wikipedia definition) for MD5: 5 reverting 4 and 2
(as 4 is unrelated to what 5 is doing and 2's contribution is removed already, it can thus not be undone anymore by 5)


2. "Real-life" examples rated as false positives in the user evaluation: 

When you asked me for the examples, I started digging them up from the data sample that was used and in fact realized that many false positives of the MD5 method are related to self-reverts. As this is no issue for our data extraction aims (we want to have self-reverts in the results  as well) and was not considered when just randomly drawing edit pairs from the two methods' results, we didn't discuss this in the paper. If you, of course, do not consider self-reverts to be reverts in the Wikipedia definition sense, they could be filtered out by collapsing subsequent edits of one editor before running the revert analysis with the MD5 method. That would reduce the number of false positives notably I assume. I will certainly look into that.
If you don't collapse these edits, however (which is not regularly done before reporting/using revert detection results), the number of false positives will be quite high, as the edits-to-be-collapsed (and prone to being misinterpreted) appear quite often and their span can at times be considerably large. And of course there are cases not related to self-reverts.

I tried to select examples representative for the sample, which received very little or no votes to be full reverts (as detected by MD5):

Example A
detected as reverted: http://en.wikipedia.org/w/index.php?&diff=25866415
detected as reverting: http://en.wikipedia.org/w/index.php?diff=25866579

detected-as-reverting edit removes only "insomnia" from the detected-as-reverted edit, i.e. no full revert, as some insertions from previous edits have already been deleted by the reverted editor himself. 
Would be a correct full revert if you collapsed edits by the reverted editor to one.


Example B

detected as reverted: http://en.wikipedia.org/w/index.php?diff=196507540
detected as reverting: http://en.wikipedia.org/w/index.php?diff=196507775

self-revert of vandalism introduced ("kirsty u tit") before second editor reverts--> cannot be reverted by detected-as-reverted edit. Would also be remedied by collapsing the first editors edits. 


Example C
not related to self-revert, this is an example of incomplete vandalism repair, which is then subsequently completed: 

detected as reverted: http://en.wikipedia.org/w/index.php?diff=162097520
detected as reverting:http://en.wikipedia.org/w/index.php?diff=162113945


Example D
not related to self-revert

A revert is carried out by TheJazzDalek targeting the edits by 74.131.204.39, but in the same edit, something is deleted by TheJazzDalek, leading to a new unique revisions content. As 74.131.204.39 in the next edit reverts this deletion by TheJazzDalek, but not the initial revert of his (74.131.204.39's) own edits, it is erroneously concluded that 74.131.204.39 reverts himself, which is not the case. 

detected as reverted: http://en.wikipedia.org/w/index.php?diff=292533562
detected as reverting: http://en.wikipedia.org/w/index.php?diff=292760323



Example E

detected as reverted: http://en.wikipedia.org/w/index.php?diff=231824943
detected as reverting: http://en.wikipedia.org/w/index.php?diff=231960286

First, reverting editor (Laser brain) undoes (not rolling back to/ not creating duplicate revision) some edits by another editor before deleting the result of  67.162.68.255 's edits (one of which was detected here as reverted). The "detected as reverted" revision is partly self-reverted by 67.162.68.255 . The other part, a date change in an "accessdate=" is not "undone" as such, but the whole "accessdate="  part is deleted (stemming from a third editor).


Example F

Here, between the "reverted" and the "reverting" one, there happens a mixture of self-reverts, reverts and different vandalism forms,:

detected as reverted:http://en.wikipedia.org/w/index.php?diff=131372047
detected as reverting:http://en.wikipedia.org/w/index.php?diff=131658207 


If I now failed to answer any of your questions please excuse me and ask me again.


Best, 

Fabian


On Jun 29, 2012, at 9:53 PM, Shilad Sen wrote:

One example I've seen of md5's failing is in common short vandalism phrases. The most obvious of these is removing all text, or single vulgar words.

Here's a scenario:
  • User A vandalizes an article by replacing it with "poop."
  • User B restores the article.
  • Some time passes....
  • User C vandalizes an article by replacing it with "poop."
User C isn't really reverting B's edit. You may be able to guess that since A was reverted, C must not be reverting, but this logic can be tricky and wrong.

In practice, I've been able to catch most of these instances by a) ignoring md5 reverts outside a certain window of revisions and b) ignoring md5 reverts whose replace lots of text with very little text.

Also, this situation occurs regularly, but nowhere near 37%.

-Shilad

On Fri, Jun 29, 2012 at 11:15 AM, Benj. Mako Hill <mako@mit.edu> wrote:
<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700">
> I'm confused by your explanation.
>
> How is it possible that this 37% of revisions that are detected as reverts
> via a md5 hash are not considered reverts by (I presume) humans?  Can you
> give a common example?  By definition, identity revert revisions represent
> an exact replica of a previous revision in an article and, therefore,
> should discard any intermediate changes.  What definition of "revert" are
> you using that the md5 hash method does not satisfy?

Also, I can't tell from either the paper or the conversation here: Are
Are you limiting this to edits that are separated by an revisions with
identical hashes by only one edit? When you do that, things become a
bit more complicated.

And are you sure your human coders aren't just relying on edit
summaries? Like Aaron, I'm having a hard time imagining a situation
where revisions go HASH-A => HASH-B => HASH-A that shouldn't be
treated like a revert and think tend to think this sounds more like
fallible than broken tools. If the user doesn't *know* or think they
are reverting an edit, it seems wrong to *not* to call that a revert.

Later,
Mako


--
Benjamin Mako Hill
mako@mit.edu
http://mako.cc/

Creativity can be a social contribution, but only in so far
as society is free to use the results. --GNU Manifesto

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




--
Shilad W. Sen
Assistant Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
ssen@macalester.edu
651-696-6273
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l





-- 
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods

Dipl.-Medwiss. Fabian Flöck
Research Associate

Building 11.40, Room 222
KIT-Campus South
D-76128 Karlsruhe

Phone: +49 721 608 4 6584
Skype: f.floeck_work
E-Mail: fabian.floeck@kit.edu
WWW: http://www.aifb.kit.edu/web/Fabian_Flöck

KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association