[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Andrew Gray andrew.gray at dunelm.org.uk
Thu Aug 20 17:35:51 UTC 2009


2009/8/20 Erik Zachte <erikzachte at infodisiac.com>:
> There is another way to detect 100% reverts. It won't catch manual reverts
> that are not 100 accurate but most vandal patrollers will use undo, and the
> like.
>
> For every revision calculate md5 checksum of content. Then you can easily
> look back say 100 revisions to see whether this checksum occurred earlier.
> It is efficient and unambiguous.

A slightly less effective method would be to use the page size in
bytes; this won't give the precise one-to-one matching, but as I
believe it's already calculated in the data it might well be quicker.

One other false positive here: edit warring where one or both sides is
using undo/rollback. You'll get the impression of a lot of vandalism
without there necessarily being any.

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk




More information about the wikimedia-l mailing list