[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Robert Rohde rarohde at gmail.com
Thu Aug 20 23:58:03 UTC 2009


On Thu, Aug 20, 2009 at 4:37 PM, Anthony<wikimail at inbox.org> wrote:
> On Thu, Aug 20, 2009 at 7:13 PM, Robert Rohde <rarohde at gmail.com> wrote:
>
>> On Thu, Aug 20, 2009 at 3:57 PM, Thomas Dalton<thomas.dalton at gmail.com>
>> wrote:
>> > 2009/8/20 Anthony <wikimail at inbox.org>:
>> >> I wouldn't suggest looking at the edit history at all, just the most
>> recent
>> >> revision as of whatever moment in time is chosen.  If vandalism is
>> found,
>> >> then and only then would one look through the edit history to find out
>> when
>> >> it was added.
>> >
>> > That only works if the article is very well referenced and you have
>> > all the references and are willing to fact-check everything. Otherwise
>> > you will miss subtle vandalism like changing the date of birth by a
>> > year.
>>
>> It's not just facts.  There are many ways to degrade the qualify of an
>> article (such as removing entire sections) that would be invisible if
>> one looks at only one revision.
>
>
> I guess that's true.  People could be removing facts, for instance, which
> wouldn't be apparently by looking at one revision.  So such an analysis
> would potentially understate actual vandalism.  But at least we'd know in
> which direction the percentage is potentially wrong.  And anecdotally, I
> don't think the understatement would be significant.

You seem to be identifying all errors with vandalism.  Sometimes
factual errors are simply unintentional mistakes.  I agree that
accuracy is important, but I think you are thinking about the question
somewhat differently than I am.

<snip>

> I'm attempting to best answer the question "if one chooses a random page
> from Wikipedia right now, what is the probability of receiving a
> vandalized revision", which I take to have nothing whatsoever to do with the
> number of reverts.

Let me describe the issue differently.  The practical issue I am
concerned with might be better expressed as the following:  For any
given article, what is the probability that the current revision is
not the best available revision (i.e. most accurate, most complete,
etc.)  Vandalism, in general, takes a page and makes it worse.  I am
interested in the problem of characterizing how often this happens
with an eye to being able to go back to that prior better version.
(This also explains why I am less interested in vandalism that
persists through many revisions.  Once that occurs, it makes less
sense to try and go back to the pre-vandalized revision.)

Your concern for establishing overall article accuracy is a good one,
but it is largely orthogonal to my interest in figuring out whether
the current revision is likely to be better or worse than the
revisions that came before it.

-Robert Rohde



More information about the foundation-l mailing list