[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

Thu Aug 27 16:41:29 UTC 2009

Recently, I reported on a simple study of how likely one was to
encounter recent vandalism in Wikipedia based on selecting articles at
random and using revert behavior as a proxy for recent vandalism.

http://lists.wikimedia.org/pipermail/foundation-l/2009-August/054171.html

One of the key limitations of that work was that it was looking at
articles selected at random from the pool of all existing page titles.
 That approach was of the most immediate interest to me, but it didn't
directly address the likelihood of encountering vandalism based on the
way that Wikipedia is actually used because the selection of articles
that people choose to visit is highly non-random.

I've now redone that analysis with a crude traffic based weighting.
For traffic information I used the same data stream used by
http://stats.grok.se.  That data is recorded hourly.  For simplicity I
chose 20 hours at random from the last eight months and averaged those
together to get a rough picture of the relative prominence of pages.
I then chose a selection of 30000 articles at random with their
probability of selection proportional to the traffic they received,
and repeated the prior analysis previously described.  (Note that this
has the effect of treating the prominence of each page as a constant
over time.  In practice we know some pages rise to prominence while
other fall down, but I am assuming the average pattern is still a good
enough approximation to be useful.)

>From this sample I found 5,955,236 revert events in 38,096,653 edits.
This is an increase of 29 times in edit frequency and 58 times the
number of revert events that were found from a uniform sampling of
pages.  I suspect it surprises no one that highly trafficked pages are
edited more often and subject to more vandalism than the average page,
though it might not have been obvious that the the ratio of reverts to
normal edits is also increased over more obscure pages.

As before, the revert time distribution has a very long tail, though
as predicted the times are generally reduced when traffic weighting is
applied.  In the traffic weighted sample, the median time to revert is
3.4 minutes and the mean time is 2.2 hours (compared to 6.7 minutes
and 18.2 hours with uniform weighting).  Again, I think it is worth
acknowledging that having a majority of reverts occur within only a
few minutes is a strong testament to the efficiency and dedication
with which new edits are usually reviewed by the community.  We could
be much worse off if most things weren't caught so quickly.

Unfortunately, in comparing the current analysis to the previous one,
the faster response time is essentially being overwhelmed by the much
larger number of vandalism occurrences.  The net result is that
averaged over the whole history of Wikipedia a visitor would be
expected to receive a recently degraded article version during about
1.1% of requests (compared to ~0.37% in the uniform weighting
estimate).  The last six months averaged a slightly higher 1.3% (1 in
80 requests).  As before, most of the degraded content that people are
likely to actually encounter is coming from the subset of things that
get by the initial monitors and survive for a long time.  Among edits
that are eventually reverted the longest lasting 5% of bad content
(those edits taking > 7.2 hours to revert) is responsible for 78% of
the expected encounters with recently degraded material.  One might
speculate that such long-lived material is more likely to reflect
subtle damage to a page rather than more obvious problems like page
blanking.  I did not try to investigate this.

In my sample, the number of reverts being made to articles has
declined ~40% since a peak in late 2006.  However, the mean and median
time to revert is little changed over the last two years.  What little
trend exists points in the direction of slightly slower responses.

So to summarize, the results here are qualitatively similar to those
found in the previous work.  However with traffic weighting we find
quantitative differences such that reverts occur much more often but
take less time to be executed.  The net effect of these competing
factors is such that the bad content is more likely to be seen than
suggested by the uniform weighting.

-Robert Rohde