Thanks for all the references and excellent advice so far!

I've looked into the Hale Anti-Bot Method™, but because I've sampled my corpus on articles (based on category co-membership), the resulting groupby users gives these semi-automated users more "normal" distributions since their other contributions are censored. In other words, I see only a fraction of these users' contributions and thus the resulting time intervals I observe are spaced farther apart (more typical) than they actually are. It's not feasible for me to get 100k+ users' histories just for the purposes of cleaning up ~6k articles' histories.

Another thought I had was that because many semi-automated tools such as Twinkle and AWB leave parenthetical annotations in their revision comments, would this be a relatively inexpensive way to filter out revisions rather than users? Some caveats, I'd like to get domain experts' feedback on. I'm not expecting settled research, just input from others' experiences munging the data.

1. Is the inclusion of this markup in revision comments optional? This is a concern that some users may enable or disable it, so I may end up biasing inclusion based on users' preferences. 
2. How have these flags or markup changed over time? This is a concern that Twinke/AWB/etc. may have started/stopped including flags or changed what they included over time. 
3. Are there other API queries or data elsewhere I could use to identify (semi-)automated revisions?


On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Brian Keegan, 18/05/2014 18:10:

Is there a way to retrieve a canonical list of bots on enwiki or elsewhere?

A Bots.csv list exists. https://meta.wikimedia.org/wiki/Wikistat_csv
In general: please edit https://meta.wikimedia.org/wiki/Research:Identifying_bot_accounts

Nemo


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



--
Brian C. Keegan, Ph.D.
Post-Doctoral Research Fellow, Lazer Lab
College of Social Sciences and Humanities, Northeastern University
Fellow, Institute for Quantitative Social Sciences, Harvard University
Affiliate, Berkman Center for Internet & Society, Harvard Law School

b.keegan@neu.edu
www.brianckeegan.com
M: 617.803.6971
O: 617.373.7200
Skype: bckeegan