[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles
Andrew Gray
andrew.gray at dunelm.org.uk
Thu Aug 20 17:35:51 UTC 2009
2009/8/20 Erik Zachte <erikzachte at infodisiac.com>:
> There is another way to detect 100% reverts. It won't catch manual reverts
> that are not 100 accurate but most vandal patrollers will use undo, and the
> like.
>
> For every revision calculate md5 checksum of content. Then you can easily
> look back say 100 revisions to see whether this checksum occurred earlier.
> It is efficient and unambiguous.
A slightly less effective method would be to use the page size in
bytes; this won't give the precise one-to-one matching, but as I
believe it's already calculated in the data it might well be quicker.
One other false positive here: edit warring where one or both sides is
using undo/rollback. You'll get the impression of a lot of vandalism
without there necessarily being any.
--
- Andrew Gray
andrew.gray at dunelm.org.uk
More information about the wikimedia-l
mailing list