Another thought I had was that because many semi-automated tools such as
Twinkle and AWB leave parenthetical annotations in their revision comments
See Stuarts comments above. And also the queries he linked too.
It would be nice if we could get these queries in version control and
share them.
Maybe there is potential for building a hand-curated list of bot user_ids
in version control as well.
-Aaron
On Mon, May 19, 2014 at 10:17 AM, Brian Keegan <b.keegan(a)neu.edu> wrote:
Thanks for all the references and excellent advice so
far!
I've looked into the Hale Anti-Bot Method™, but because I've sampled my
corpus on articles (based on category co-membership), the resulting groupby
users gives these semi-automated users more "normal" distributions since
their other contributions are censored. In other words, I see only a
fraction of these users' contributions and thus the resulting time
intervals I observe are spaced farther apart (more typical) than they
actually are. It's not feasible for me to get 100k+ users' histories just
for the purposes of cleaning up ~6k articles' histories.
Another thought I had was that because many semi-automated tools such as
Twinkle and AWB leave parenthetical annotations in their revision comments,
would this be a relatively inexpensive way to filter out revisions rather
than users? Some caveats, I'd like to get domain experts' feedback on. I'm
not expecting settled research, just input from others' experiences munging
the data.
1. Is the inclusion of this markup in revision comments optional? This is
a concern that some users may enable or disable it, so I may end up biasing
inclusion based on users' preferences.
2. How have these flags or markup changed over time? This is a concern
that Twinke/AWB/etc. may have started/stopped including flags or changed
what they included over time.
3. Are there other API queries or data elsewhere I could use to identify
(semi-)automated revisions?
On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com
wrote:
Brian Keegan, 18/05/2014 18:10:
Is there a way to retrieve a canonical list of bots on enwiki or
elsewhere?
A Bots.csv list exists.
https://meta.wikimedia.org/wiki/Wikistat_csv
In general: please edit
https://meta.wikimedia.org/
wiki/Research:Identifying_bot_accounts
Nemo
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Brian C. Keegan, Ph.D.
Post-Doctoral Research Fellow, Lazer Lab
College of Social Sciences and Humanities, Northeastern University
Fellow, Institute for Quantitative Social Sciences, Harvard University
Affiliate, Berkman Center for Internet & Society, Harvard Law School
b.keegan(a)neu.edu
www.brianckeegan.com
M: 617.803.6971
O: 617.373.7200
Skype: bckeegan
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l