Re: [Wikitech-l] Project Idea for GSoC 2013 - Bayesian Spam Filter

12 Apr 2013


      On 2013-04-12 7:33 PM, "Platonides" Platonides@gmail.com wrote:
...
On 09/04/13 18:20, Quim Gil wrote:
...
Hi Anubhav,
I have done a first reality check with Chris Steipp, who oversees the
area of security and also spam prevention. Your idea is interesting and
it seems to be feasible. This is a very good first step!
It would require adding a hook to MediaWiki core, but this could be a
small, acceptable change.
I agree. Adding a hook is no problem.
Well a hook is obviously no problem, im not sure why a new one would be
needed. Surely if the abuse filter has all the hooks it needs, so would
this.
Qgill wrote:
...
It might have a performance penalty in a site like English Wikipedia with
plenty of concurrent edits, but for starters it could be potentially useful
to the 99% of MediaWiki instances that have a significantly smaller number
of daily edits and especially a very small number of editors and tools able
/ happy to deal with spam.
Hmm. I was playing with nlp-ish automated newpage patrol recently. One
thing that crossed my mind was if it becomes too expensive, one could run
the classifier in the job queue (and hence on a dedicated server(s) ) and
tag changes shortly after the fact.
Last of all I would suggest you also read up on other people who have done
machine learning approaches to vandalism detection. In particular
user:cluebot_NG - http://en.wikipedia.org/wiki/User:Cluebot_NG . There is
also a list of academic papers on the subject at
http://en.wikipedia.org/w/index.php?title=User:Emijrp/Anti-vandalism_bot_cen...
said, an extension like you are proposing does not have to be as good
as the rather complex state of the art in order to be useful. Any effective
system would probably be quite useful).
-bawolff

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Project Idea for GSoC 2013 - Bayesian Spam Filter