[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data

Brion Vibber brion at wikimedia.org
Fri Aug 28 04:43:33 UTC 2009


On 8/27/09 9:39 PM, Thomas Dalton wrote:
> 2009/8/28 Gregory Maxwell<gmaxwell at gmail.com>:
>> If the results of this kind of study have good agreement with
>> mechanical proxy metrics (such as machine detected vandalism) our
>> confidence in those proxies will increase, if they disagree it will
>> provide an opportunity to improve the proxies.
>
> This kind of intensive study on a few small sample with a more
> automated method used on the same sample to compare would be more
> achievable. If the automated method gets similar results, we can use
> that method for larger samples.

I would certainly be interested in seeing such a result.

Generally speaking we can expect a strong correlation between vandalism 
and machine-identifiable reverts -- it's a totally reasonable assumption 
for a first-order approximation -- and it would be valuable to confirm 
this and see how much divergence there might be between this count and 
other markers.

Most interesting following this would be take into account the effects 
of flagged revisions and how this could affect initially-displayed vs 
edited revisions. Has there been similar work targeting German-language 
Wikipedia already?

Robert, is it possible to share the source for generating the 
revert-based stats with other folks who may be interested in pursuing 
further work on the subject? Thanks!

-- brion vibber (brion @ wikimedia.org)
CTO & Senior Software Architect, Wikimedia Foundation




More information about the wikimedia-l mailing list