Re: [WikiEN-l] JarlaxleArtemis/Grawp

30 Dec 2008


      Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in
Wikipedia.
http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_20...
Abstract. We present results of a new approach to detect destructive article
revi-
sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class
clas-
siﬁcation problem, where vandalism edits are the target to be identiﬁed
among
all revisions. Interestingly, vandalism detection has not been addressed in
the In-
formation Retrieval literature by now. In this paper we discuss the
characteristics
of vandalism as humans recognize it and develop features to render vandalism
detection as a machine learning task. We compiled a large number of
vandalism
edits in a corpus, which allows for the comparison of existing and new
detection
approaches. Using logistic regression we achieve 83% precision at 77% recall
with our model.* Compared to the rule-based methods that are currently
applied*
*in Wikipedia, our approach increases the F -Measure performance by 49%
while*
*being faster at the same time.*
Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850
Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot,
CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the
best of those (AntiVandalBot) by a very wide margin.
So why are you wasting the ISPs time and the police's time when the best of
the passive technology routes have not been explored? Using machine learning
*you pit the vandals against themselves.  *Every time they perform a
particular kind of vandalism, it can never be performed again because the
bot will recognize it.
Cheers,
On Mon, Dec 29, 2008 at 4:15 PM, Brian Brian.Mingus@colorado.edu wrote:
...
By the way, I ask those questions having read the bots user page. It is
apparently quite effective,  indicating to me that this user causes minimal
disruption.
On Mon, Dec 29, 2008 at 4:11 PM, Brian Brian.Mingus@colorado.edu wrote:
...
What percentage of his page moves were not picked up automatically by a
bot?
What percentage of this users vandalism is not picked up by a bot?
Why is the ISP responsible for what he dumps into Wikipedia, rather than
Wikipedia, as it allows itself to be a dumping ground? The Viacom/Youtube
lawsuit demonstrates that this is a legal grey area, thus, I see little
ground on which to punish the entire ip range of the ISP.
Why are machine learning bots that are trained on previous vandalism in
order to detect new vandalism not being used? They have been developed. Why
is the Foundation not funding their further development?
I believe the direction of this thread has been all wrong.
Peace,
On Mon, Dec 29, 2008 at 4:07 PM, Soxred93 soxred93@gmail.com wrote:
...
The problem with that is that many articles we have would not be
found in any dictionary.
X!
On Dec 29, 2008, at 6:02 PM [Dec 29, 2008 ], Ian Woollard wrote:
...
On 29/12/2008, Joe Szilagyi szilagyi@gmail.com wrote:
...
Allow blocking on a more granular level, if we know his ISP, and lock
out moves and redirects for the whole damn ISPs, and specifically
point the finger back in the block message: Blocked because of
JarlaxleArtemis/Grawp with a nice shiny link to his long-term abuse
page.
It probably wouldn't work because of proxies and people that would
emulate/help him.
Still, ideas that would affect less people rather than more like that
are almost certainly IMO the way to go; for example restricting the
range of characters and checking that the move title consists of words
in a dictionary before permitting non admins or users with a small
number of edits to complete a move might be desirable.
...

Joe

--
-Ian Woollard
We live in an imperfectly imperfect world. Life in a perfectly
imperfect world would be much better.

WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l

WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
--
You have successfully failed!
--
You have successfully failed!
-- 
You have successfully failed!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] JarlaxleArtemis/Grawp