Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia

19 Mar 2009


      Brian wrote:
...
This extension is very important for training  machine learning
vandalism detection bots. Recently published systems use only hundreds
of examples of vandalism in training - not nearly enough to
distinguish between the variety found in Wikipedia or generalize to
new, unseen forms of vandalism. A large set of human created rules
could be run against all previous edits in order to create a massive
vandalism dataset.
As a machine-learning person, this seems like a somewhat problematic 
idea--- generating training examples *from a rule set* and then learning 
on them is just a very roundabout way of reconstructing that rule set. 
What you really want is a large dataset of human-labeled examples of 
vandalism / non-vandalism that *can't* currently be distinguished 
reliably by rules, so you can throw a machine-learning algorithm at the 
problem of trying to come up with some.
-Mark

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Abuse Filter extension activated on English Wikipedia