[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Anthony wikimail at inbox.org
Fri Aug 21 00:18:04 UTC 2009


On Thu, Aug 20, 2009 at 7:58 PM, Robert Rohde <rarohde at gmail.com> wrote:
>
> You seem to be identifying all errors with vandalism.


How so?


> Sometimes factual errors are simply unintentional mistakes.


Obviously we can't know the intent of the person for sure, but after a
mistake is found it's relatively simple to find where it was added and
decide whether or not we are going to call it vandalism.  This is an
inherent problem with answering the question.  If you can't determine it
manually, you sure as hell won't be able to determine it using automated
methods.


> Let me describe the issue differently.  The practical issue I am
> concerned with might be better expressed as the following:  For any
> given article, what is the probability that the current revision is
> not the best available revision (i.e. most accurate, most complete,
> etc.)  Vandalism, in general, takes a page and makes it worse.  I am

interested in the problem of characterizing how often this happens
> with an eye to being able to go back to that prior better version.
> (This also explains why I am less interested in vandalism that
> persists through many revisions.  Once that occurs, it makes less
> sense to try and go back to the pre-vandalized revision.)


*nod*.  Yes, we certainly have different things we're interested in
measuring.  If someone vandalizes an article, say to change the population
of a country from 3 million to 2.9 million, and then 20 other people improve
the article without fixing that fact, I'd still count that as vandalized.

On the other hand, are you sure you don't want to add an "indisputably"
before "not the best available revision"?  I mean, I'd say Wikipedia is
probably in the double digit percentages, at least in terms of popular
articles, if you don't add "indisputably".


More information about the foundation-l mailing list