[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data
Thomas Dalton
thomas.dalton at gmail.com
Fri Aug 28 00:39:10 UTC 2009
2009/8/28 Gregory Maxwell <gmaxwell at gmail.com>:
> This is somewhat labor intensive, but only somewhat as it doesn't take
> an inordinate number of samples to produce representative results.
> This should be the gold standard for this kind of measurement as it
> would be much closer to what people actually want to know than most
> machine metrics.
To get a fair sample we would need to include some highly active
pages. They have ridiculous numbers of revisions (even if you restrict
it to the last few months).
> If the results of this kind of study have good agreement with
> mechanical proxy metrics (such as machine detected vandalism) our
> confidence in those proxies will increase, if they disagree it will
> provide an opportunity to improve the proxies.
This kind of intensive study on a few small sample with a more
automated method used on the same sample to compare would be more
achievable. If the automated method gets similar results, we can use
that method for larger samples.
More information about the wikimedia-l
mailing list