[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Andrew Gray andrew.gray at dunelm.org.uk
Thu Aug 20 21:57:04 UTC 2009


2009/8/20 Gregory Maxwell <gmaxwell at gmail.com>:

> Going back to your simple study now:  The analysis of vandalism
> duration and its impact on readers makes an assumption about
> readership which we know to be invalid. You're assuming a uniform
> distribution of readership: That readers are just as likely to read
> any random article. But we know that the actual readership follows a
> power-law (long-tail) distribution. Because of the failure to consider
> traffic levels we can't draw conclusions on how much vandalism readers
> are actually exposed to.

We're also assuming a uniform distribution of vandalism, as it were.
There's a number of different types of vandalism; obscene defacement,
malicious alteration of factual content, meaningless test edits of a
character or two, schoolkids leaving messages for each other...

...and it all has a different impact on the reader.

This has two implications:

a) It seems safe to assume that replacing the entire article with
"john is gay" is going to get spotted and reverted faster, on average,
than an edit providing a plausible-sounding but entirely fictional
history for a small town in Kansas. So, any changes in the pattern of
the *content* of vandalism is going to lead to changes in the duration
and thus overall frequency of it, even if the amount of vandal edits
is constant.

b) We can easily compare the difference in effect for vandalism to be
left on differently trafficed pages for various times - roughly
speaking, time * traffic = number of readers affected. If some
vandalism is worse than others, we could thus also calculate some kind
of intensity metric - one hundred people viewing enormous genital
piercing images on [[Kitten]] is probably worse than ten thousand
people viewing "asdfdfggfh" at the end of a paragraph in the same
article.

I'm not sure how we'd go ahead with the second one, but it's an
interesting thing to think about.

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk




More information about the wikimedia-l mailing list