[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

Fri Aug 28 00:39:10 UTC 2009

2009/8/28 Gregory Maxwell <gmaxwell at gmail.com>:
> This is somewhat labor intensive, but only somewhat as it doesn't take
> an inordinate number of samples to produce representative results.
> This should be the gold standard for this kind of measurement as it
> would be much closer to what people actually want to know than most
> machine metrics.

To get a fair sample we would need to include some highly active
pages. They have ridiculous numbers of revisions (even if you restrict
it to the last few months).

> If the results of this kind of study have good agreement with
> mechanical proxy metrics (such as machine detected vandalism) our
> confidence in those proxies will increase, if they disagree it will
> provide an opportunity to improve the proxies.

This kind of intensive study on a few small sample with a more
automated method used on the same sample to compare would be more
achievable. If the automated method gets similar results, we can use
that method for larger samples.