[WikiEN-l] Blocking of School IPs

Chris Howie cdhowie at gmail.com
Thu Jan 10 04:46:02 UTC 2008


On Jan 9, 2008 12:49 PM, Noah Salzman <nds at salzman.net> wrote:

> This area is ripe for exploration. Has anyone looked into "Summer of
> Code" type projects for this sort of thing? The signatures for the
> great majority of vandalism are not difficult to understand.
>

But difficult to obtain without flooding.  As a developer of two
vandalfighting tools (one still unreleased) I can tell you that the most
difficult part of developing such a tool is not the AI, but having it be
efficient with respect to its network usage.  You can't go and download five
diffs every time you see an edit on browne, especially not when you're
coding it into a tool meant to be used by many users.  The www servers would
probably choke.  (I know there is quite a caching server farm, but to my
knowledge diff pages are not so cached, and I don't think anything is cached
for logged-in users.)

Then there's the fact that diffs aren't even available in an easily-parsable
format.  We have to download a page full of HTML and rip it apart.  Show me
a developer that *wants* to code to that spec.

What we need is a MediaWiki query API for obtaining the unformatted diff of
a revision, with the ability to specify multiple requests at once.  Even
then we are talking about quite a bit of traffic (especially if the system
is run by many users) but far less and in a format much better suited to be
analyzed.

Really once we have some easy and efficient way to get diffs, it's just a
matter of forking spamassassin and writing some quality rules.  :)

-- 
Chris Howie
http://www.chrishowie.com
http://en.wikipedia.org/wiki/User:Crazycomputers


More information about the WikiEN-l mailing list