There have been a few publication on the subject:
1. "Us vs. them: Understanding social dynamics in Wikipedia with revert graph visualizations", B Suh, EH Chi, BA Pendleton.
2. "He says, she says: Conflict and coordination in Wikipedia.", A Kittur, B Suh, BA Pendleton.
From my experience I can tell that analyzing MD5s is not enough to identify all reverts.
And there are some tricks even to these. Generally you need to have knowledge about user reputations,
article content, comment content to identify true reverts.
There are several groups of reverts which can be loosely identified as:
* regular reverts;
* self-reverts;
* revert wars;
You need to take care of these cases when identifying reverts.
Some cases can be tricky, for example:
# Marking : between duplicates, by other users (reverted, questionable)
# Revision 54 (regular edit) User0 Regular edit
# Revision 55 (regular edit) User1 Regular edit
# Revision 56 (revert to 54) User2 Vandalism
# Revision 57 (vandalism) User2 Vandalism
# Revision 58 (revert to 56/54) User3 Correcting vandalism, but not quite
# Revision 59 (revert to 55) User4 Revert to Revision 55
Note that User 2 had tried to hide his 'revert vandalism' with regular vandalism,
this had misled User3, but was finally corrected by User4.
Blanking also creates duplicate MD5 signatures, you need to take care of these.
And of course users do reverts manually (and in some cases not exactly).
If you familiar with Python, you may want to take a look at the following code:
lookup line 444: def analyze_reverts(revisions) in the:
http://code.google.com/p/pymwdat/source/browse/trunk/toolkit.py
-- Best, Dmitry
Hi,
I'm trying to detect reverts in Wikipedia for my research, right now with a self-built script using MD5hashes and DIFFs between revisions. I always read about people taking reverts into account in their data, but it's seldomly described HOW exactly a revert is determined or what tool they use to do that. Can you point me to any research or tools or tell me maybe what you used in your own research to identify which edits were reverted and/or who reverted them?
Best,
Fabian
--
Karlsruhe Institute of Technology (KIT)
Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck
Research Associate
Building 11.40, Room 222
KIT-Campus South
D-76128 Karlsruhe
Phone: +49 721 608 4 6584
Skype: f.floeck_work
E-Mail: fabian.floeck@kit.edu
WWW: http://www.aifb.kit.edu/web/Fabian_Flöck
KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l