---------- Forwarded message ---------- From: Reid Priedhorsky reid@umn.edu Date: Thu, Aug 20, 2009 at 9:58 AM Subject: Re: [Wiki-research-l] [Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles To: wiki-research-l@lists.wikimedia.org
On 08/20/2009 11:34 AM, Gregory Maxwell wrote:
On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohderarohde@gmail.com wrote: [snip]
When one downloads a dump file, what percentage of the pages are actually in a vandalized state?
Although you don't actually answer that question, you answer a different question:
[snip]
approximations: I considered that "vandalism" is that thing which gets reverted, and that "reverts" are those edits tagged with "revert, rv, undo, undid, etc." in the edit summary line. Obviously, not all vandalism is cleanly reverted, and not all reverts are cleanly tagged.
Which is interesting too, but part of the problem with calling this a measure of vandalism is that it isn't really, and we don't really have a good handle on how solid an approximation it is beyond gut feelings and arm-waving.
We looked into this a couple of years ago and came up with a similar number (though I won't quote it because I don't quite remember what it was), though we estimated the probability that a viewer would encounter a damaged article rather than how many articles were currently damaged.
We used the term "damaged" instead of "vandalized" for essentially the reasons you mention (though I confess I didn't fully read your whole letter).
Priedhorsky et al., GROUP 2007.
Reid -----
I'm forwarding Reid's message from Wikiresearch-l to Foundation-l because, for those interested, it's worth noting that their group looked into this question & published a paper in 2007. Here's the link: http://www.grouplens.org/node/113
-- phoebe
wikimedia-l@lists.wikimedia.org