Re: [WikiEN-l] Blocking of School IPs

10 Jan 2008


      On Jan 9, 2008 12:49 PM, Noah Salzman nds@salzman.net wrote:
...
This area is ripe for exploration. Has anyone looked into "Summer of
Code" type projects for this sort of thing? The signatures for the
great majority of vandalism are not difficult to understand.
But difficult to obtain without flooding.  As a developer of two
vandalfighting tools (one still unreleased) I can tell you that the most
difficult part of developing such a tool is not the AI, but having it be
efficient with respect to its network usage.  You can't go and download five
diffs every time you see an edit on browne, especially not when you're
coding it into a tool meant to be used by many users.  The www servers would
probably choke.  (I know there is quite a caching server farm, but to my
knowledge diff pages are not so cached, and I don't think anything is cached
for logged-in users.)
Then there's the fact that diffs aren't even available in an easily-parsable
format.  We have to download a page full of HTML and rip it apart.  Show me
a developer that *wants* to code to that spec.
What we need is a MediaWiki query API for obtaining the unformatted diff of
a revision, with the ability to specify multiple requests at once.  Even
then we are talking about quite a bit of traffic (especially if the system
is run by many users) but far less and in a format much better suited to be
analyzed.
Really once we have some easy and efficient way to get diffs, it's just a
matter of forking spamassassin and writing some quality rules.  :)
-- 
Chris Howie
http://www.chrishowie.com
http://en.wikipedia.org/wiki/User:Crazycomputers

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Blocking of School IPs