Potthast, Stein, Gerling. (2008). Automatic Vandalism Detection in Wikipedia. http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf
Abstract. We present results of a new approach to detect destructive article revi-
sions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class clas-
sification problem, where vandalism edits are the target to be identified among
all revisions. Interestingly, vandalism detection has not been addressed in the In-
formation Retrieval literature by now. In this paper we discuss the characteristics
of vandalism as humans recognize it and develop features to render vandalism
detection as a machine learning task. We compiled a large number of vandalism
edits in a corpus, which allows for the comparison of existing and new detection
approaches. Using logistic regression we achieve 83% precision at 77% recall
with our model. Compared to the rule-based methods that are currently applied
in Wikipedia, our approach increases the F -Measure performance by 49% while
being faster at the same time.
By the way, I ask those questions having read the bots user page. It is apparently quite effective, indicating to me that this user causes minimal disruption.--On Mon, Dec 29, 2008 at 4:11 PM, Brian <Brian.Mingus@colorado.edu> wrote:
What percentage of his page moves were not picked up automatically by a bot?
What percentage of this users vandalism is not picked up by a bot?
Why is the ISP responsible for what he dumps into Wikipedia, rather than Wikipedia, as it allows itself to be a dumping ground? The Viacom/Youtube lawsuit demonstrates that this is a legal grey area, thus, I see little ground on which to punish the entire ip range of the ISP.
Why are machine learning bots that are trained on previous vandalism in order to detect new vandalism not being used? They have been developed. Why is the Foundation not funding their further development?
I believe the direction of this thread has been all wrong.
Peace,--On Mon, Dec 29, 2008 at 4:07 PM, Soxred93 <soxred93@gmail.com> wrote:
The problem with that is that many articles we have would not be
found in any dictionary.
X!
On Dec 29, 2008, at 6:02 PM [Dec 29, 2008 ], Ian Woollard wrote:
> On 29/12/2008, Joe Szilagyi <szilagyi@gmail.com> wrote:
>> Allow blocking on a more granular level, if we know his ISP, and lock
>> out moves and redirects for the whole damn ISPs, and specifically
>> point the finger back in the block message: Blocked because of
>> JarlaxleArtemis/Grawp with a nice shiny link to his long-term abuse
>> page.
>
> It probably wouldn't work because of proxies and people that would
> emulate/help him.
>
> Still, ideas that would affect less people rather than more like that
> are almost certainly IMO the way to go; for example restricting the
> range of characters and checking that the move title consists of words
> in a dictionary before permitting non admins or users with a small
> number of edits to complete a move might be desirable.
>
>> - Joe
>
> --
> -Ian Woollard
>
> We live in an imperfectly imperfect world. Life in a perfectly
> imperfect world would be much better.
>
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l@lists.wikimedia.org
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
_______________________________________________
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l
You have successfully failed!
You have successfully failed!