Re: [Wiki-research-l] Kill the bots

19 May 2014

from ACM Authors Guide (http://www.acm.org/sigs/publications/sigguide-v2.2sp
):

"Private communications should be acknowledged, not referenced (e.g.
"[Robertson, personal communication]")."

Although this one was not quite private ;-)

.t

On Mon, May 19, 2014 at 1:59 AM, Brian Keegan &lt;b.keegan(a)neu.edu&gt; wrote:

...
  How does one cite emails in ACM proceedings format?
:)

 On Sunday, May 18, 2014, R.Stuart Geiger &lt;sgeiger(a)gmail.com&gt; wrote:

  Tsk tsk tsk, Brian. When the revolution comes,
bot discriminators will
 get no mercy. :-)

 But seriously, my tl;dr: instead of asking if an account is or isn't a
 bot, ask if a set of edits are or are not automated

 Great responses so far: searching usernames for *bot will exclude non-bot
 users who were registered before the username policy change (although *Bot
 is a bit better), and the logging table is a great way to collect bot
 flags. However, Scott is right -- the bot flag (or *Bot username) doesn't
 signify a bot, it signifies a bureaucrat recognizing that a user account
 successfully went through the Bot Approval Group process. If I see an
 account with a bot flag, I can generally assume the edits that account
 makes are initiated by an automated software agent. This is especially the
 case in the main namespace. The inverse assumption is not nearly as easy: I
 can't assume that every edit made from an account *without* a bot flag was
 *not* an automated edit.

 About unauthorized bots: yes, there are a relatively small number of
 Wikipedians who, on occasion, run fully-automated, continuously-operating
 bots without approval. Complicating this, if someone is going to take
 the time to build and run a bot, but isn't going to create a separate
 account for it, then it is likely that they are also using that account to
 do non-automated edits. Sometimes new bot developers will run an
 unauthorized bot under their own account during the initial stages of
 development, and only later in the process will they create a separate bot
 account and seek formal approval and flagging. It can get tricky when you
 exclude all the edits from an account for being automated based on a single
 suspicious set of edits.

 More commonly, there are many more people who use automated batch tools
 like AutoWikiBrowser to support one-off tasks, like mass find-and-replace
 or category cleanup. Accounts powered by AWB are technically not bots,
 only because a human has to sit there and click "save" for every batch edit
 that is made. Some people will create a separate bot account for AWB
 work and get it approved and flagged, but many more will not bother. Then
 there are people using semi-automated, human-in-the-loop tools like Huggle
 to do vandal fighting. I find that the really hard question is whether
 you include or exclude these different kinds of 'cyborgs', because it
 really makes you think hard about what exactly you're measuring. Is
 someone who does a mass find-and-replace on all articles in a category a
 co-author of each article they edit? Is a vandal fighter patrolling the
 recent changes feed with Huggle a co-author of all the articles they edit
 when they revert vandalism and then move on to the next diff? What about
 somebody using rollback in the web browser? If so, what is it that makes
 these entities authors and ClueBot NG not an author?

 When you think about it, user accounts are actually pretty remarkable in
 that they allow such a diverse set of uses and agents to be attributed to a
 single entity. So when it comes to identifying automation, I personally
 think it is better to shift the unit of analysis from the user account to
 the individual edit. A bot flag lets you assume all edits from an account
 are automated, but you can use a range of approaches to identifying sets of
 automated edits from non-flagged accounts. Then I have a set of regex SQL
 queries in the Query Library [1] which parses edit summaries for the traces
 that AWB, Huggle, Twinkle, rollback, etc. automatically leave by default.
 You can also use the edit session approach like Scott has suggested -- Aaron
 and I found a few unauthorized bots in our edit session study [2], and we
 were even using a more aggressive break, with no more than a 60 minute gap
 between edits. To catch short bursts of bulk edits, you could look at large
 numbers of edits made in a short period of time -- I'd say more than 7 main
 namespace edits a minute for 10 minutes would be a hard rate for even a
 very aggressive vandal fighter to maintain with Huggle.

 I'll conclude by saying that different kinds of automated editing
 techniques are different ways of participating in and contributing to
 Wikipedia. To systematically exclude automated edits is to remove a very
 important, meaningful, and heterogeneous kind of activity from view. These
 activities constitute a core part of what Wikipedia is, particularly
 those forms of automation which the community has explicitly authorized and
 recognized. Now, we researchers inevitably have to selectively reveal
 and occlude -- a co-authorship network based on main namespace edits also
 excludes talk page discussions and conflict resolution, and this also
 constitutes a core part of what Wikipedia is. It isn't wrong per se to
 exclude automated edits, and it is certainly much worse to not recognize
 that they exist at all. However, I always appreciate seeing how the
 analysis would be different if bots were not excluded. The fact that
 there are these weird users which absolutely dominate a co-authorship
 network graph if you don't filter them out is pretty amazing, at least to
 me.

 Best,
 Stuart

 [1]
 https://wiki.toolserver.org/view/MySQL_queries#Automated_tool_and_bot_edits
 [2] http://stuartgeiger.com/cscw13-labor-hours.pdf

 On Sun, May 18, 2014 at 10:08 AM, Scott Hale &lt;computermacgyver(a)gmail.com&gt;wrote;wrote:

 Very helpful, Lukas, I didn't know about the logging table.

 In some recent work [1] I found many users that appeared to be bots but
 whose edits did not have the bot flag set. My approach was to exclude users
 who didn't have a break of more than 6 hours between edits over the entire
 month I was studying. I was interested in the users who had multiple edit
 sessions in the month and so when with a straight threshold. A way to keep
 users with only one editing session would be to exclude users who have no
 break longer than X hours in an edit session lasting at least Y hours
  (e.g., a user who doesn't break for more than 6 hours in 5-6 days is
 probably not human)

 Cheers,
 Scott

 [1] Multilinguals and Wikipedia Editing
 http://www.scotthale.net/pubs/?websci2014

 --
 Scott Hale
 Oxford Internet Institute
 University of Oxford
 http://www.scotthale.net/
 scott.hale(a)oii.ox.ac.uk

 On Sun, May 18, 2014 at 5:45 PM, Lukas Benedix &lt;lbenedix(a)l3q.de&gt; wrote:

 Here is a list of currently flagged bots:

https://en.wikipedia.org/w/index.php?title=Special:ListUsers&offset=&am…

 Another good point to look for bots is here:

https://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix…

 You should also have a look at this pages to find former bots:
 https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_1
 https://en.wikipedia.org/wiki/Wikipedia:Bots/Status/inactive_bots_2

 And last but not least the logging table you can access via tool labs:
 SELECT DISTINCT(log_title)
 FROM logging
 WHERE log_action = 'rights'
 AND log_params LIKE '%bot%';

 Lukas

 Am So 18.05.2014 18:34, schrieb Andrew G. West:
  User name policy states that "*bot*"
names are reserved for bots.
 Thus, such a regex shouldn't be too hacky, but I cannot comment
 whether some non-automated cases might slip through new user patrol. I
 do think dumps make the 'users' table available, and I know for sure
 one could get a full list via the API.

 As a check on this, you could check that when these usernames edit,
 whether or not they set the "bot" flag. -AW

 _______________________________________________
 Wiki-research-l mai

 --
 Brian C. Keegan, Ph.D.
 Post-Doctoral Research Fellow, Lazer Lab
 College of Social Sciences and Humanities, Northeastern University
 Fellow, Institute for Quantitative Social Sciences, Harvard University
 Affiliate, Berkman Center for Internet & Society, Harvard Law School

 b.keegan(a)neu.edu
 www.brianckeegan.com
 M: 617.803.6971
 O: 617.373.7200
 Skype: bckeegan

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

-- 
.t

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Kill the bots