Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

19 Mar 2009

Cobi (owner of ClueBot) and his roomate Crispy have already been  
working hard to make this specific dataset, but they've been hurt by  
not enough contributors. The page is here: http://en.wikipedia.org/ 
wiki/User:Crispy1989#New_Dataset_Contribution_Interface

X!

On Mar 19, 2009, at 8:15 AM [Mar 19, 2009 ], Tei wrote:

...
  On Thu, Mar 19, 2009 at 1:03 PM, Delirium
&lt;delirium(a)hackish.org&gt;  
 wrote:

  Brian wrote:
  This extension is very important for training 
machine learning
 vandalism detection bots. Recently published systems use only  
 hundreds
 of examples of vandalism in training - not nearly enough to
 distinguish between the variety found in Wikipedia or generalize to
 new, unseen forms of vandalism. A large set of human created rules
 could be run against all previous edits in order to create a massive
 vandalism dataset.  As a machine-learning person, this seems like a somewhat
problematic
 idea--- generating training examples *from a rule set* and then  
 learning
 on them is just a very roundabout way of reconstructing that rule  
 set.
 What you really want is a large dataset of human-labeled examples of
 vandalism / non-vandalism that *can't* currently be distinguished
 reliably by rules, so you can throw a machine-learning algorithm  
 at the
 problem of trying to come up with some.

 since theres already a database, this sounds like could be done  
 flagging
 edits as "vandalism", and then reading the existing database  
 information to
 extract these details, like ip,  a diff of the change, etc..   that  
 way,
 humans define what is a "vandalism", and the machine can learn the  
 meaning.

 this may need a button or something, so users report this, and the  
 database
 flag the edit

 -- 
 --
 ℱin del ℳensaje.
 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia