[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Mark Wagner carnildo at gmail.com
Fri Aug 21 01:30:53 UTC 2009


On Thu, Aug 20, 2009 at 14:10, Anthony<wikimail at inbox.org> wrote:
> On Thu, Aug 20, 2009 at 1:55 PM, Nathan <nawrich at gmail.com> wrote:
>>
>> My point (which might still be incorrect, of course) was that an analysis
>> based on 30,000 randomly selected pages was more informative about the
>> English Wikipedia than 100 articles about serving United States Senators.
>
>
> Any automated method of finding vandalism is doomed to failure.  I'd say its
> informativeness was precisely zero.
>
> Greg's analysis, on the other hand, was informative, but it was targeted at
> a much different question than Robert's.
>
> "if one chooses a random page from Wikipedia right now, what is the
> probability of receiving a vandalized revision"  The best way to answer that
> question would be with a manually processed random sample taken from a
> pre-chosen moment in time.  As few as 1000 revisions would probably be
> sufficient, if I know anything about statistics, but I'll let someone with
> more knowledge of statistics verify or refute that.  The results will depend
> heavily on one's definition of "vandalism", though.

I did this in an informal fashion in 2005 during my "hundred article"
surveys.  Of the 503 pages I looked at, only one was clearly
vandalized the first time I looked at it, so I'd say a thousand
samples is probably too small to get any sort of precision on the
vandalism rate.

-- 
Mark Wagner
[[User:Carnildo]]




More information about the wikimedia-l mailing list