<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM
-0700">
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts
via a md5 hash are not considered reverts by (I presume) humans? Can you
give a common example? By definition, identity revert revisions represent
an exact replica of a previous revision in an article and, therefore,
should discard any intermediate changes. What definition of "revert" are
you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are
Are you limiting this to edits that are separated by an revisions with
identical hashes by only one edit? When you do that, things become a
bit more complicated.
And are you sure your human coders aren't just relying on edit
summaries? Like Aaron, I'm having a hard time imagining a situation
where revisions go HASH-A => HASH-B => HASH-A that shouldn't be
treated like a revert and think tend to think this sounds more like
fallible than broken tools. If the user doesn't *know* or think they
are reverting an edit, it seems wrong to *not* to call that a revert.
Later,
Mako
--
Benjamin Mako Hill
mako(a)mit.edu
http://mako.cc/
Creativity can be a social contribution, but only in so far
as society is free to use the results. --GNU Manifesto