[WikiEN-l] JarlaxleArtemis/Grawp

Mon Dec 29 23:29:42 UTC 2008

Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in
Wikipedia.
http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf

Abstract. We present results of a new approach to detect destructive article
revi-
sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class
clas-
siﬁcation problem, where vandalism edits are the target to be identiﬁed
among
all revisions. Interestingly, vandalism detection has not been addressed in
the In-
formation Retrieval literature by now. In this paper we discuss the
characteristics
of vandalism as humans recognize it and develop features to render vandalism
detection as a machine learning task. We compiled a large number of
vandalism
edits in a corpus, which allows for the comparison of existing and new
detection
approaches. Using logistic regression we achieve 83% precision at 77% recall
with our model.* Compared to the rule-based methods that are currently
applied*
*in Wikipedia, our approach increases the F -Measure performance by 49%
while*
*being faster at the same time.*

Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850
Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot,
CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the
best of those (AntiVandalBot) by a very wide margin.

So why are you wasting the ISPs time and the police's time when the best of
the passive technology routes have not been explored? Using machine learning
*you pit the vandals against themselves.  *Every time they perform a
particular kind of vandalism, it can never be performed again because the
bot will recognize it.

Cheers,