<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700">
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? When you do that, things become a bit more complicated.
And are you sure your human coders aren't just relying on edit summaries? Like Aaron, I'm having a hard time imagining a situation where revisions go HASH-A => HASH-B => HASH-A that shouldn't be treated like a revert and think tend to think this sounds more like fallible than broken tools. If the user doesn't *know* or think they are reverting an edit, it seems wrong to *not* to call that a revert.
Later, Mako