On 11 February 2010 15:48, Carcharoth carcharothwp@googlemail.com wrote:
So is it as big a problem as it seems? What percentage of vandalism doesn't get caught for days or weeks?
Well, here's a ballpark guess...
I've made 35,000 ish edits. I reckon maybe 10 or 20% of those are routine reversing other people's contributions. vandalism, etc. Call it 5,000.
I can think of perhaps a dozen cases where I've found very long-term or similarly complicated vandalism; assume I've forgotten half of them, or rolled-back without quite noticing the dates, and that implies about 0.5% of vandalism edits are something Particularly Remarkable.
Entirely unscientific, of course, but there you go.
The previous study relied on randomly sampling articles. Here's a couple of other possibilities:
a) Use vandalism *reports*. People tend to write in to the complaints address when they find vandalism irrespective of how long it's been there, or how complicated it is; if they were going to revert it, the odds are they wouldn't write. So, we could take a large sample of vandalism report emails (all archived in OTRS), identify their timestamp and the article being written about, find the relevant vandalism edit, find its reversion.
b) Use reversions. Sample a thousand uses of rollback from the recent changes list, find time between that edit and the one it was reverting.
The first of these overestimates vandalism to high-traffic pages, and so would *probably* lead to overcounting "young" vandalism - if we make the reasonably safe assumption that high traffic = high editor interest = many watchlists, etc.
The second of these falls down on undercounting old vandalism. Older vandalism tends to require conventional editing, rather than the use of rollback or undo, because there's higher odds the article's been edited since.
Any other ideas?