[Foundation-l] Frequency of Seeing Bad Versions - now with traffic data
Brion Vibber
brion at wikimedia.org
Fri Aug 28 04:43:33 UTC 2009
On 8/27/09 9:39 PM, Thomas Dalton wrote:
> 2009/8/28 Gregory Maxwell<gmaxwell at gmail.com>:
>> If the results of this kind of study have good agreement with
>> mechanical proxy metrics (such as machine detected vandalism) our
>> confidence in those proxies will increase, if they disagree it will
>> provide an opportunity to improve the proxies.
>
> This kind of intensive study on a few small sample with a more
> automated method used on the same sample to compare would be more
> achievable. If the automated method gets similar results, we can use
> that method for larger samples.
I would certainly be interested in seeing such a result.
Generally speaking we can expect a strong correlation between vandalism
and machine-identifiable reverts -- it's a totally reasonable assumption
for a first-order approximation -- and it would be valuable to confirm
this and see how much divergence there might be between this count and
other markers.
Most interesting following this would be take into account the effects
of flagged revisions and how this could affect initially-displayed vs
edited revisions. Has there been similar work targeting German-language
Wikipedia already?
Robert, is it possible to share the source for generating the
revert-based stats with other folks who may be interested in pursuing
further work on the subject? Thanks!
-- brion vibber (brion @ wikimedia.org)
CTO & Senior Software Architect, Wikimedia Foundation
More information about the wikimedia-l
mailing list