Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in Wikipedia. http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_20...
Abstract. We present results of a new approach to detect destructive article revi- sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class clas- sification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the In- formation Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83% precision at 77% recall with our model.* Compared to the rule-based methods that are currently applied* *in Wikipedia, our approach increases the F -Measure performance by 49% while* *being faster at the same time.*
Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850 Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot, CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the best of those (AntiVandalBot) by a very wide margin.
So why are you wasting the ISPs time and the police's time when the best of the passive technology routes have not been explored? Using machine learning *you pit the vandals against themselves. *Every time they perform a particular kind of vandalism, it can never be performed again because the bot will recognize it.
Cheers,