[WikiEN-l] JarlaxleArtemis/Grawp
Brian
Brian.Mingus at colorado.edu
Mon Dec 29 23:28:14 UTC 2008
Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in
Wikipedia.
http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf
Abstract. We present results of a new approach to detect destructive article
revi-
sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class
clas-
sification problem, where vandalism edits are the target to be identified
among
all revisions. Interestingly, vandalism detection has not been addressed in
the In-
formation Retrieval literature by now. In this paper we discuss the
characteristics
of vandalism as humans recognize it and develop features to render vandalism
detection as a machine learning task. We compiled a large number of
vandalism
edits in a corpus, which allows for the comparison of existing and new
detection
approaches. Using logistic regression we achieve 83% precision at 77% recall
with our model.* Compared to the rule-based methods that are currently
applied*
*in Wikipedia, our approach increases the F -Measure performance by 49%
while*
*being faster at the same time.*
Open the PDF, scan to page 667. This bot outperforms MartinBot, T-850
Robotic Assistant, WerdnaAntiVandalBot, Xenophon, ClueBot,
CounterVandalismBot, PkgBot, MiszaBot, and AntiVandalBot. It outperforms the
best of those (AntiVandalBot) by a very wide margin.
So why are you wasting the ISPs time and the police's time when the best of
the passive technology routes have not been explored? Using machine learning
*you pit the vandals against themselves. *Every time they perform a
particular kind of vandalism, it can never be performed again because the
bot will recognize it.
Cheers,
On Mon, Dec 29, 2008 at 4:15 PM, Brian <Brian.Mingus at colorado.edu> wrote:
> By the way, I ask those questions having read the bots user page. It is
> apparently quite effective, indicating to me that this user causes minimal
> disruption.
>
>
> On Mon, Dec 29, 2008 at 4:11 PM, Brian <Brian.Mingus at colorado.edu> wrote:
>
>> What percentage of his page moves were not picked up automatically by a
>> bot?
>>
>> What percentage of this users vandalism is not picked up by a bot?
>>
>> Why is the ISP responsible for what he dumps into Wikipedia, rather than
>> Wikipedia, as it allows itself to be a dumping ground? The Viacom/Youtube
>> lawsuit demonstrates that this is a legal grey area, thus, I see little
>> ground on which to punish the entire ip range of the ISP.
>>
>> Why are machine learning bots that are trained on previous vandalism in
>> order to detect new vandalism not being used? They have been developed. Why
>> is the Foundation not funding their further development?
>>
>> I believe the direction of this thread has been all wrong.
>>
>> Peace,
>>
>>
>>
>> On Mon, Dec 29, 2008 at 4:07 PM, Soxred93 <soxred93 at gmail.com> wrote:
>>
>>> The problem with that is that many articles we have would not be
>>> found in any dictionary.
>>>
>>> X!
>>>
>>> On Dec 29, 2008, at 6:02 PM [Dec 29, 2008 ], Ian Woollard wrote:
>>>
>>> > On 29/12/2008, Joe Szilagyi <szilagyi at gmail.com> wrote:
>>> >> Allow blocking on a more granular level, if we know his ISP, and lock
>>> >> out moves and redirects for the whole damn ISPs, and specifically
>>> >> point the finger back in the block message: Blocked because of
>>> >> JarlaxleArtemis/Grawp with a nice shiny link to his long-term abuse
>>> >> page.
>>> >
>>> > It probably wouldn't work because of proxies and people that would
>>> > emulate/help him.
>>> >
>>> > Still, ideas that would affect less people rather than more like that
>>> > are almost certainly IMO the way to go; for example restricting the
>>> > range of characters and checking that the move title consists of words
>>> > in a dictionary before permitting non admins or users with a small
>>> > number of edits to complete a move might be desirable.
>>> >
>>> >> - Joe
>>> >
>>> > --
>>> > -Ian Woollard
>>> >
>>> > We live in an imperfectly imperfect world. Life in a perfectly
>>> > imperfect world would be much better.
>>> >
>>> > _______________________________________________
>>> > WikiEN-l mailing list
>>> > WikiEN-l at lists.wikimedia.org
>>> > To unsubscribe from this mailing list, visit:
>>> > https://lists.wikimedia.org/mailman/listinfo/wikien-l
>>>
>>>
>>> _______________________________________________
>>> WikiEN-l mailing list
>>> WikiEN-l at lists.wikimedia.org
>>> To unsubscribe from this mailing list, visit:
>>> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>>>
>>
>>
>>
>> --
>> You have successfully failed!
>>
>
>
>
> --
> You have successfully failed!
>
--
You have successfully failed!
More information about the WikiEN-l
mailing list