On 9/4/06, maru dubshinki marudubshinki@gmail.com wrote: [snip]
not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account."
[snip]
I wish Aaronsw had been a little more open about his metholodigy in his results.
While working a bots for automated vandalism I found pure diffwords to be a poor metric of whats actually changed because confused new users often manage to insert big gobs of crud that get reverted.
A simmlar test to this which used the IBM history flow tool to attribute all the text in the most recent version of an article to its orignal author did not find simmlar results. This might be because copyediting can cause the history flow tool to misattribute, or it might be indicitaive of a systematic flaw in Aaronsw's reseach.
In any case, if were going to do studies which sample only a handfull of articles, it would likely be better to do manual analysis... It wouldn't take long to step through 400 diffs given the right user interface.