On Wed, Apr 30, 2008 at 12:36 AM, Philip Sandifer <snowspinner(a)gmail.com> wrote:
To my mind, one of the major problems with our
practices is that there is an excessive focus on a measure of personal
This is sadly true, and is honestly one of the reasons I started RC
patrolling (well, it was fun too). Since then I have learned through
the silence of others that nobody is interested in glorifying vandal
fighters, which, while at first disappointing, is actually right.
(Though I will say that the occasional thank you on someone's talk
page is quite gratifying and at least lets us know that people
appreciate what we do.)
Right now my motivation to patrol is to try to keep crap off
Wikipedia, and it can be really fun sometimes. Unfortunately, the
fact that RC patrol is fun to some people seems to be used
occasionally by actual content authors and editors as a tool to point
at us and say "look, they're having fun and aren't contributing any
real content, they must not really care about Wikipedia." Which is as
unhelpful as it is incorrect. But I digress.
Vandalism clean-up should be considered an organic
the wiki in which waste matter is expelled. Vandal fighting should not
be a role or a job - it should be a natural process intrinsic to the
system. Ideally a counter-vandalism unit or an explicit vandal-
fighting procedure should be unnecessary.
Should be, but in reality is more necessary than we'd like. There is
some vandalism that is common (name-calling, intentionally erroneous
altering of numbers, etc) and some that is more sneaky. People who
make it their primary task to find and fix this stuff are going to
have a better knowledge of what is common, uncommon, how to deal with
it, and even how to write stuff that can detect it. Your average
editor is not going to know this.
I'm not saying this to demean authors and editors at all. In fact, I
have the highest respect for the people that can and do edit and I
often wish that I was capable in that regard. But I'm much better
with technical stuff than making editorial decisions and writing prose
and stuff like that. It doesn't mix with my brain. (You don't want
me massively editing content. Trust me.) So I use what I have to
benefit the project as best I can. To me that is RC patrol and coding
As an aside, I would speculate that most RC patrolers have a tendency
toward programming, however advanced. People like that (myself
included) like problem-solving and to some extent repetition. I'd be
curious to see how many people in the RC patrol category are coders as
compared to a few other categories.
Our next line of defense is automated tools. We have
capability vandalism reversion bot. We need a more vigorous one. This
will require writing subtle filters - we can't revert every edit that
adds the word "penis" even if 99% of those edits are vandalism. But on
the other hand, spam-filtering technology could probably be adapted
here successfully. This is a technical solution and requires coders to
take on the project. Where can we get volunteers? Is there existing
code (perhaps spam-filtering code) that could be adapted? These are
important questions, but they need to be taken to the bot community to
get useful answers.
I've often thought about forking SpamAssassin to work with diffs. The
main issue that I see is that while e-mail is one big file, a diff is
many types of content mixed together -- stuff added, removed, changed,
and unchanged. Getting a scoring mechanism to weight them
appropriately would be a pain. I'm not even sure if there is an
A better system might be to score the old and new versions and
subtract the new score from the old score. So things like OMFG PENIS
would substantially raise the score of the new version, and
subtracting them would cancel out what was found in the old version
too. New articles would just be scored by themselves and if the score
is high enough maybe reported somewhere.
Then we have to worry about the article about the penis... some
articles may need to be disabled for automatic patrol or otherwise
dealt with specially, by human tweaking.
Anyway, I'm rambling again. (Told you I shouldn't edit content.)
This is just one idea I've had for a few months and have yet to
implement or experiment with sufficiently. If you want to look at my
~/projects you'll find about twenty or thirty more.