[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles
Jimmy Wales
jwales at wikia-inc.com
Thu Aug 20 16:38:20 UTC 2009
Robert Rohde wrote:
> When one downloads a dump file, what percentage of the pages are
> actually in a vandalized state?
>
> This is equivalent to asking, if one chooses a random page from
> Wikipedia right now, what is the probability of receiving a vandalized
> revision?
Is there a possibility of re-running the numbers to include traffic
weightings?
I would hypothesize from experience that if we adjust the "random page"
selection to account for traffic (to get a better view of what people
are actually seeing) we would see slightly different results.
I think we would see a lot less (percentagewise) vandalism that persists
for a really long time for precisely the reason you identified: most
vandalism that lasts a long time, lasts a long time because it is on
obscure pages that no one is visiting. That doesn't mean it is not a
problem, but it does change some thinking about what kinds of tools are
needed to deal with that problem.
I'm not sure what else would change.
More information about the wikimedia-l
mailing list