No subject


Thu Jul 16 06:53:57 UTC 2009


This is an increase of 29 times in edit frequency and 58 times the
number of revert events that were found from a uniform sampling of
pages.  I suspect it surprises no one that highly trafficked pages are
edited more often and subject to more vandalism than the average page,
though it might not have been obvious that the the ratio of reverts to
normal edits is also increased over more obscure pages.

As before, the revert time distribution has a very long tail, though
as predicted the times are generally reduced when traffic weighting is
applied.  In the traffic weighted sample, the median time to revert is
3.4 minutes and the mean time is 2.2 hours (compared to 6.7 minutes
and 18.2 hours with uniform weighting).  Again, I think it is worth
acknowledging that having a majority of reverts occur within only a
few minutes is a strong testament to the efficiency and dedication
with which new edits are usually reviewed by the community.  We could
be much worse off if most things weren't caught so quickly.

Unfortunately, in comparing the current analysis to the previous one,
the faster response time is essentially being overwhelmed by the much
larger number of vandalism occurrences.  The net result is that
averaged over the whole history of Wikipedia a visitor would be
expected to receive a recently degraded article version during about
1.1% of requests (compared to ~0.37% in the uniform weighting
estimate).  The last six months averaged a slightly higher 1.3% (1 in
80 requests).  As before, most of the degraded content that people are
likely to actually encounter is coming from the subset of things that
get by the initial monitors and survive for a long time.  Among edits
that are eventually reverted the longest lasting 5% of bad content
(those edits taking > 7.2 hours to revert) is responsible for 78% of
the expected encounters with recently degraded material.  One might
speculate that such long-lived material is more likely to reflect
subtle damage to a page rather than more obvious problems like page
blanking.  I did not try to investigate this.

In my sample, the number of reverts being made to articles has
declined ~40% since a peak in late 2006.  However, the mean and median
time to revert is little changed over the last two years.  What little
trend exists points in the direction of slightly slower responses.


So to summarize, the results here are qualitatively similar to those
found in the previous work.  However with traffic weighting we find
quantitative differences such that reverts occur much more often but
take less time to be executed.  The net effect of these competing
factors is such that the bad content is more likely to be seen than
suggested by the uniform weighting.

-Robert Rohde



More information about the foundation-l mailing list