[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Anthony wikimail at inbox.org
Thu Aug 20 23:37:26 UTC 2009


On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde <rarohde at gmail.com> wrote:

> On Thu, Aug 20, 2009 at 3:57 PM, Thomas Dalton<thomas.dalton at gmail.com>
> wrote:
> > 2009/8/20 Anthony <wikimail at inbox.org>:
> >> I wouldn't suggest looking at the edit history at all, just the most
> recent
> >> revision as of whatever moment in time is chosen.  If vandalism is
> found,
> >> then and only then would one look through the edit history to find out
> when
> >> it was added.
> >
> > That only works if the article is very well referenced and you have
> > all the references and are willing to fact-check everything. Otherwise
> > you will miss subtle vandalism like changing the date of birth by a
> > year.
>
> It's not just facts.  There are many ways to degrade the qualify of an
> article (such as removing entire sections) that would be invisible if
> one looks at only one revision.


I guess that's true.  People could be removing facts, for instance, which
wouldn't be apparently by looking at one revision.  So such an analysis
would potentially understate actual vandalism.  But at least we'd know in
which direction the percentage is potentially wrong.  And anecdotally, I
don't think the understatement would be significant.

There's also the question of whether or not we want to count an article
which had a fact removed a few years ago and never re-added to be a
"vandalized revision".

Anthony seems to be talking about a question of article accuracy
> (unless I am misreading him).


I'm attempting to best answer the question "if one chooses a random page
from Wikipedia right now, what is the probability of receiving a
vandalized revision", which I take to have nothing whatsoever to do with the
number of reverts.


> That is overlapping issue with
> addressing vandalism, but there are a significant number of ways to
> commit vandalism that nonetheless have nothing to do with impairing
> the resulting article's accuracy.


Significant number?  I can only think of a handful.



More information about the wikimedia-l mailing list