2009/8/28 Gregory Maxwell gmaxwell@gmail.com:
This is somewhat labor intensive, but only somewhat as it doesn't take an inordinate number of samples to produce representative results. This should be the gold standard for this kind of measurement as it would be much closer to what people actually want to know than most machine metrics.
To get a fair sample we would need to include some highly active pages. They have ridiculous numbers of revisions (even if you restrict it to the last few months).
If the results of this kind of study have good agreement with mechanical proxy metrics (such as machine detected vandalism) our confidence in those proxies will increase, if they disagree it will provide an opportunity to improve the proxies.
This kind of intensive study on a few small sample with a more automated method used on the same sample to compare would be more achievable. If the automated method gets similar results, we can use that method for larger samples.