On Wed, Apr 30, 2008 at 12:36 AM, Philip Sandifer snowspinner@gmail.com wrote:
To my mind, one of the major problems with our current anti-vandalism practices is that there is an excessive focus on a measure of personal glory.
This is sadly true, and is honestly one of the reasons I started RC patrolling (well, it was fun too). Since then I have learned through the silence of others that nobody is interested in glorifying vandal fighters, which, while at first disappointing, is actually right. (Though I will say that the occasional thank you on someone's talk page is quite gratifying and at least lets us know that people appreciate what we do.)
Right now my motivation to patrol is to try to keep crap off Wikipedia, and it can be really fun sometimes. Unfortunately, the fact that RC patrol is fun to some people seems to be used occasionally by actual content authors and editors as a tool to point at us and say "look, they're having fun and aren't contributing any real content, they must not really care about Wikipedia." Which is as unhelpful as it is incorrect. But I digress.
Vandalism clean-up should be considered an organic process of the wiki in which waste matter is expelled. Vandal fighting should not be a role or a job - it should be a natural process intrinsic to the system. Ideally a counter-vandalism unit or an explicit vandal- fighting procedure should be unnecessary.
Should be, but in reality is more necessary than we'd like. There is some vandalism that is common (name-calling, intentionally erroneous altering of numbers, etc) and some that is more sneaky. People who make it their primary task to find and fix this stuff are going to have a better knowledge of what is common, uncommon, how to deal with it, and even how to write stuff that can detect it. Your average editor is not going to know this.
I'm not saying this to demean authors and editors at all. In fact, I have the highest respect for the people that can and do edit and I often wish that I was capable in that regard. But I'm much better with technical stuff than making editorial decisions and writing prose and stuff like that. It doesn't mix with my brain. (You don't want me massively editing content. Trust me.) So I use what I have to benefit the project as best I can. To me that is RC patrol and coding anti-vandalism tools.
As an aside, I would speculate that most RC patrolers have a tendency toward programming, however advanced. People like that (myself included) like problem-solving and to some extent repetition. I'd be curious to see how many people in the RC patrol category are coders as compared to a few other categories.
Our next line of defense is automated tools. We have a limited capability vandalism reversion bot. We need a more vigorous one. This will require writing subtle filters - we can't revert every edit that adds the word "penis" even if 99% of those edits are vandalism. But on the other hand, spam-filtering technology could probably be adapted here successfully. This is a technical solution and requires coders to take on the project. Where can we get volunteers? Is there existing code (perhaps spam-filtering code) that could be adapted? These are important questions, but they need to be taken to the bot community to get useful answers.
I've often thought about forking SpamAssassin to work with diffs. The main issue that I see is that while e-mail is one big file, a diff is many types of content mixed together -- stuff added, removed, changed, and unchanged. Getting a scoring mechanism to weight them appropriately would be a pain. I'm not even sure if there is an appropriate weighting.
A better system might be to score the old and new versions and subtract the new score from the old score. So things like OMFG PENIS would substantially raise the score of the new version, and subtracting them would cancel out what was found in the old version too. New articles would just be scored by themselves and if the score is high enough maybe reported somewhere.
Then we have to worry about the article about the penis... some articles may need to be disabled for automatic patrol or otherwise dealt with specially, by human tweaking.
Anyway, I'm rambling again. (Told you I shouldn't edit content.) This is just one idea I've had for a few months and have yet to implement or experiment with sufficiently. If you want to look at my ~/projects you'll find about twenty or thirty more.