[WikiEN-l] Unreverted vandalism

Andrew Gray andrew.gray at dunelm.org.uk
Thu Feb 11 17:05:17 UTC 2010


On 11 February 2010 15:48, Carcharoth <carcharothwp at googlemail.com> wrote:

> So is it as big a problem as it seems? What percentage of vandalism
> doesn't get caught for days or weeks?

Well, here's a ballpark guess...

I've made 35,000 ish edits. I reckon maybe 10 or 20% of those are
routine reversing other people's contributions. vandalism, etc. Call
it 5,000.

I can think of perhaps a dozen cases where I've found very long-term
or similarly complicated vandalism; assume I've forgotten half of
them, or rolled-back without quite noticing the dates, and that
implies about 0.5% of vandalism edits are something Particularly
Remarkable.

Entirely unscientific, of course, but there you go.

The previous study relied on randomly sampling articles. Here's a
couple of other possibilities:

a) Use vandalism *reports*. People tend to write in to the complaints
address when they find vandalism irrespective of how long it's been
there, or how complicated it is; if they were going to revert it, the
odds are they wouldn't write. So, we could take a large sample of
vandalism report emails (all archived in OTRS), identify their
timestamp and the article being written about, find the relevant
vandalism edit, find its reversion.

b) Use reversions. Sample a thousand uses of rollback from the recent
changes list, find time between that edit and the one it was
reverting.

The first of these overestimates vandalism to high-traffic pages, and
so would *probably* lead to overcounting "young" vandalism - if we
make the reasonably safe assumption that high traffic = high editor
interest = many watchlists, etc.

The second of these falls down on undercounting old vandalism. Older
vandalism tends to require conventional editing, rather than the use
of rollback or undo, because there's higher odds the article's been
edited since.

Any other ideas?

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk



More information about the WikiEN-l mailing list