Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

19 Mar 2009


      Cobi (owner of ClueBot) and his roomate Crispy have already been  
working hard to make this specific dataset, but they've been hurt by  
not enough contributors. The page is here: http://en.wikipedia.org/ 
wiki/User:Crispy1989#New_Dataset_Contribution_Interface
X!
On Mar 19, 2009, at 8:15 AM [Mar 19, 2009 ], Tei wrote:
...
On Thu, Mar 19, 2009 at 1:03 PM, Delirium delirium@hackish.org  
wrote:
...
Brian wrote:
...
This extension is very important for training  machine learning
vandalism detection bots. Recently published systems use only  
hundreds
of examples of vandalism in training - not nearly enough to
distinguish between the variety found in Wikipedia or generalize to
new, unseen forms of vandalism. A large set of human created rules
could be run against all previous edits in order to create a massive
vandalism dataset.
As a machine-learning person, this seems like a somewhat problematic
idea--- generating training examples *from a rule set* and then  
learning
on them is just a very roundabout way of reconstructing that rule  
set.
What you really want is a large dataset of human-labeled examples of
vandalism / non-vandalism that *can't* currently be distinguished
reliably by rules, so you can throw a machine-learning algorithm  
at the
problem of trying to come up with some.
since theres already a database, this sounds like could be done  
flagging
edits as "vandalism", and then reading the existing database  
information to
extract these details, like ip,  a diff of the change, etc..   that  
way,
humans define what is a "vandalism", and the machine can learn the  
meaning.
this may need a button or something, so users report this, and the  
database
flag the edit
--
ℱin del ℳensaje.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

--