There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like.
For every revision calculate md5 checksum of content. Then you can easily look back say 100 revisions to see whether this checksum occurred earlier. It is efficient and unambiguous.
This will work for any Wikipedia for which a full archive dump is available.
Erik Zachte
2009/8/20 Erik Zachte erikzachte@infodisiac.com:
There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like.
For every revision calculate md5 checksum of content. Then you can easily look back say 100 revisions to see whether this checksum occurred earlier. It is efficient and unambiguous.
A slightly less effective method would be to use the page size in bytes; this won't give the precise one-to-one matching, but as I believe it's already calculated in the data it might well be quicker.
One other false positive here: edit warring where one or both sides is using undo/rollback. You'll get the impression of a lot of vandalism without there necessarily being any.
On Thu, Aug 20, 2009 at 11:23 AM, Erik Zachte erikzachte@infodisiac.comwrote:
There is another way to detect 100% reverts. It won't catch manual reverts that are not 100 accurate but most vandal patrollers will use undo, and the like.
For every revision calculate md5 checksum of content. Then you can easily look back say 100 revisions to see whether this checksum occurred earlier. It is efficient and unambiguous.
This will work for any Wikipedia for which a full archive dump is available.
Erik Zachte
Luca's WikiTrust could easily reveal this info.
wikimedia-l@lists.wikimedia.org