There have been a few publication on the subject:
1. "Us vs. them: Understanding social dynamics in Wikipedia with revert graph visualizations", B Suh, EH Chi, BA Pendleton.
2. "He says, she says: Conflict and coordination in Wikipedia.", A Kittur, B Suh, BA Pendleton.


From my experience I can tell that analyzing MD5s is not enough to identify all reverts.
And there are some tricks even to these. Generally you need to have knowledge about user reputations,
article content, comment content to identify true reverts.


There are several groups of reverts which can be loosely identified as:
 * regular reverts;
 * self-reverts;
 * revert wars;

You need to take care of these cases when identifying reverts.


Some cases can be tricky, for example:
 # Marking : between duplicates, by other users (reverted, questionable)
 # Revision 54 (regular edit)          User0    Regular edit
 # Revision 55 (regular edit)          User1    Regular edit
 # Revision 56 (revert to 54)          User2    Vandalism
 # Revision 57 (vandalism)            User2    Vandalism
 # Revision 58 (revert to 56/54)     User3    Correcting vandalism, but not quite
 # Revision 59 (revert to 55)          User4    Revert to Revision 55

Note that User 2 had tried to hide his 'revert vandalism' with regular vandalism,
this had misled User3, but was finally corrected by User4.

Blanking also creates duplicate MD5 signatures, you need to take care of these.
And of course users do reverts manually (and in some cases not exactly).

If you familiar with Python, you may want to take a look at the following code:
lookup line 444: def analyze_reverts(revisions) in the:
 http://code.google.com/p/pymwdat/source/browse/trunk/toolkit.py



-- Best, Dmitry



On Thu, Aug 18, 2011 at 2:40 AM, Flöck, Fabian <fabian.floeck@kit.edu> wrote:
Hi,

I'm trying to detect reverts in Wikipedia for my research, right now with a self-built script using MD5hashes and DIFFs between revisions. I always read about people taking reverts into account in their data, but it's seldomly described HOW exactly a revert is determined or what tool they use to do that. Can you point me to any research or tools or tell me maybe what you used in your own research to identify which edits were reverted and/or who reverted them?

Best,

Fabian




--
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods

Dipl.-Medwiss. Fabian Flöck
Research Associate

Building 11.40, Room 222
KIT-Campus South
D-76128 Karlsruhe

Phone: +49 721 608 4 6584
Skype: f.floeck_work
E-Mail: fabian.floeck@kit.edu
WWW: http://www.aifb.kit.edu/web/Fabian_Flöck

KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l