[Foundation-l] How much of Wikipedia is vandalized? 0.4% of Articles

Anthony wikimail at inbox.org
Thu Aug 20 22:53:32 UTC 2009


On Thu, Aug 20, 2009 at 6:36 PM, Robert Rohde <rarohde at gmail.com> wrote:

> On Thu, Aug 20, 2009 at 2:10 PM, Anthony<wikimail at inbox.org> wrote:
> > "if one chooses a random page from Wikipedia right now, what is the
> > probability of receiving a vandalized revision"  The best way to answer
> that
> > question would be with a manually processed random sample taken from a
> > pre-chosen moment in time.  As few as 1000 revisions would probably be
> > sufficient, if I know anything about statistics, but I'll let someone
> with
> > more knowledge of statistics verify or refute that.  The results will
> depend
> > heavily on one's definition of "vandalism", though.
>
> Only in dreadfully obvious cases can you look at a revision by itself
> and know it contains vandalism.  If the goal is really to characterize
> whether any vandalism has persisted in an article from any time in the
> past, then one really needs to look at the full edit history to see
> what has been changed / removed over time.
>

I wouldn't suggest looking at the edit history at all, just the most recent
revision as of whatever moment in time is chosen.  If vandalism is found,
then and only then would one look through the edit history to find out when
it was added.


> > Of course, this assumes a valid methodology.  Using "admin rollback, the
> > undo
> > function, the revert bots, various editing tools, and commonly used
> > phrases like "rv", "rvv", etc." to find vandalism is heavily skewed
> toward
> > vandalism that doesn't last very long (or at least doesn't last very many
> > edits).  It's basically useless.
>
> Yes, as I acknowledged above, "new vandalism".


"New vandalism" which has not yet been reverted wouldn't be included.


> My personal interest
> is also skewed in that direction.  If you don't like it and don't find
> it useful, feel free to ignore me and/or do your own analysis.


I do.  I also feel free to criticize your methods publicly, since you
decided to share them publicly.


> Anyway, I provided my data point and described what I did so others
> could judge it for themselves.  Regardless of your opinion, it
> addressed an issue of interest to me, and I would hope others also
> find some useful insight in it.


And I presented my criticism, which hopefully other will find some useful
insight in as well.


More information about the foundation-l mailing list