[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Jimmy Wales jwales at wikia-inc.com
Thu Aug 20 16:38:20 UTC 2009


Robert Rohde wrote:
> When one downloads a dump file, what percentage of the pages are
> actually in a vandalized state?
> 
> This is equivalent to asking, if one chooses a random page from
> Wikipedia right now, what is the probability of receiving a vandalized
> revision?

Is there a possibility of re-running the numbers to include traffic 
weightings?

I would hypothesize from experience that if we adjust the "random page" 
selection to account for traffic (to get a better view of what people 
are actually seeing) we would see slightly different results.

I think we would see a lot less (percentagewise) vandalism that persists 
for a really long time for precisely the reason you identified: most 
vandalism that lasts a long time, lasts a long time because it is on 
obscure pages that no one is visiting.  That doesn't mean it is not a 
problem, but it does change some thinking about what kinds of tools are 
needed to deal with that problem.

I'm not sure what else would change.





More information about the foundation-l mailing list