the Hale Anti-Bot Method™
That's a good one. =)
I'm a big fan of Scott's method
I second that. Again, great paper, Scott!
On Mon, May 19, 2014 at 5:31 PM, Aaron Halfaker <aaron.halfaker(a)gmail.com>wrote;wrote:
> Another thought I had was that because many semi-automated tools such as
>> Twinkle and AWB leave parenthetical annotations in their revision comments
> See Stuarts comments above. And also the queries he linked too.
>
https://wiki.toolserver.org/view/MySQL_queries#Automated_tool_and_bot_edits It would
be nice if we could get these queries in version control and
> share them.
> Maybe there is potential for building a
hand-curated list of bot user_ids
> in version control as well.
> -Aaron
> On Mon, May 19, 2014 at 10:17 AM, Brian Keegan <b.keegan(a)neu.edu> wrote:
>> Thanks for all the references and
excellent advice so far!
>
>> I've looked into the Hale Anti-Bot
Method™, but because I've sampled my
>> corpus on articles (based on category co-membership), the resulting groupby
>> users gives these semi-automated users more "normal" distributions
since
>> their other contributions are censored. In other words, I see only a
>> fraction of these users' contributions and thus the resulting time
>> intervals I observe are spaced farther apart (more typical) than they
>> actually are. It's not feasible for me to get 100k+ users' histories
just
>> for the purposes of cleaning up ~6k articles' histories.
>
>> Another thought I had was that because
many semi-automated tools such as
>> Twinkle and AWB leave parenthetical annotations in their revision comments,
>> would this be a relatively inexpensive way to filter out revisions rather
>> than users? Some caveats, I'd like to get domain experts' feedback on.
I'm
>> not expecting settled research, just input from others' experiences munging
>> the data.
>
>> 1. Is the inclusion of this markup in
revision comments optional? This is
>> a concern that some users may enable or disable it, so I may end up biasing
>> inclusion based on users' preferences.
>> 2. How have these flags or markup changed over time? This is a concern
>> that Twinke/AWB/etc. may have started/stopped including flags or changed
>> what they included over time.
>> 3. Are there other API queries or data elsewhere I could use to identify
>> (semi-)automated revisions?
>
>
>> On Mon, May 19, 2014 at 10:35 AM, Federico Leva (Nemo) <
>> nemowiki(a)gmail.com> wrote:
>
>>> Brian Keegan, 18/05/2014 18:10:
>>
>>> Is
there a way to retrieve a canonical list of bots on enwiki or
>>>> elsewhere?
>>>
>>
>>> A Bots.csv list exists.
https://meta.wikimedia.org/wiki/Wikistat_csv
>>> In general: please edit
https://meta.wikimedia.org/
>>> wiki/Research:Identifying_bot_accounts
>>
>>> Nemo
>>
>>
>>>
_______________________________________________
>>> Wiki-research-l mailing list
>>> Wiki-research-l(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
>
>> --
>> Brian C. Keegan, Ph.D.
>> Post-Doctoral Research Fellow, Lazer Lab
>> College of Social Sciences and Humanities, Northeastern University
>> Fellow, Institute for Quantitative Social Sciences, Harvard University
>> Affiliate, Berkman Center for Internet & Society, Harvard Law School
>
>> b.keegan(a)neu.edu
>>
www.brianckeegan.com
>> M: 617.803.6971
>> O: 617.373.7200
>> Skype: bckeegan
>
>>
_______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
_______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
-----------------------------------------
Kind regards,
Ann Samoilenko, MSc
Oxford Internet Institute
University of Oxford
Adventures can change your life
e-mail: ann.samoilenko(a)gmail.com
Skype: ann.samoilenko