[Foundation-l] Fwd: [Wiki-research-l] How much of Wikipedia is vandalized? 0.4% of Articles

phoebe ayers phoebe.wiki at gmail.com
Fri Aug 21 02:30:42 UTC 2009


---------- Forwarded message ----------
From: Reid Priedhorsky <reid at umn.edu>
Date: Thu, Aug 20, 2009 at 9:58 AM
Subject: Re: [Wiki-research-l] [Foundation-l] How much of Wikipedia
is	vandalized? 0.4% of Articles
To: wiki-research-l at lists.wikimedia.org


On 08/20/2009 11:34 AM, Gregory Maxwell wrote:
> On Thu, Aug 20, 2009 at 6:06 AM, Robert Rohde<rarohde at gmail.com> wrote:
> [snip]
>> When one downloads a dump file, what percentage of the pages are
>> actually in a vandalized state?
>
> Although you don't actually answer that question, you answer a
> different question:
>
> [snip]
>> approximations:  I considered that "vandalism" is that thing which
>> gets reverted, and that "reverts" are those edits tagged with "revert,
>> rv, undo, undid, etc." in the edit summary line.  Obviously, not all
>> vandalism is cleanly reverted, and not all reverts are cleanly tagged.
>
> Which is interesting too, but part of the problem with calling this a
> measure of vandalism is that it isn't really, and we don't really have
> a good handle on how solid an approximation it is beyond gut feelings
> and arm-waving.

We looked into this a couple of years ago and came up with a similar
number (though I won't quote it because I don't quite remember what it
was), though we estimated the probability that a viewer would encounter
a damaged article rather than how many articles were currently damaged.

We used the term "damaged" instead of "vandalized" for essentially the
reasons you mention (though I confess I didn't fully read your whole
letter).

Priedhorsky et al., GROUP 2007.

Reid
-----

I'm forwarding Reid's message from Wikiresearch-l to Foundation-l
because, for those interested, it's worth noting that their group
looked into this question & published a paper in 2007. Here's the
link:
http://www.grouplens.org/node/113

-- phoebe




More information about the wikimedia-l mailing list