Re: [Wikitech-l] User agent policy for bots

14 Jul 2014


      Note this reply represents my own views, but does not represent an official
WMF position.
On Sun, Jul 13, 2014 at 4:25 PM, John Mark Vandenberg jayvdb@gmail.com
wrote:
...
It would be good to know the answer to whether the username is logged
against API requests.  It seems like a very important piece of
information which should be visible in server ops logging of API
usage.
The API request log does record usernames. And doesn't contain user agents,
for that matter.
But my guess is that at least some of the types of problems Ops would be
concerned with are in different log files that probably do not contain
usernames but do contain user agents.
...
username is easy, if it is needed.
I would include username. The only harm is a few extra bytes per request.
...
pywiki requiring bot operators provide an email address is technically
easy, but I suspect it isnt going to be very successful or
appreciated, esp for non-SSL wikis, or understood as pywiki hasnt put
this info in the user-agent since the new user-agent policy was
introduced, so why now?
I don't see any particular need for email addresses if the on-wiki username
is provided. The key is "some method of contact".
...
If the main source of problems is the 'large' bots, they usually run
many tasks, and it is likely to only be a single task causing
problems.  With these large tasks, ideally they are paused rather than
blocked, in which case we need to introduce a standardised way to
pause a bot.  In these cases, the user agent could mention the task
identifier, and that identifier could be used to pause it until an
operator has checked their email.  The 'pause' command interface could
be IRC or user_talk, or something new based on Flow, or a API response
warning like replag which pywikibot honours.  I appreciate Bináris'
point that some (most?) wikis, especially smaller wikis, do not have
'task approval' processes with a task identifier, so this would need
to be optional.  Large bot operators would use this feature if it
meant that only a single task is paused rather than the bot account
blocked.
For the normal usage of pywikibot, being invoking an existing script
which is maintained by pywikibot, we could include in the user-agent
which script is running (e.g. move.py).
Including the "task name", which for pywikibot could be the script name,
seems sensible to me. Besides the stated distinguishing which script in a
multi-task bot is problematic, it would also help in determining that
multiple accounts/IPs are running the same problematic script.
I wouldn't go as far as requiring the task name to correspond to any
particular on-wiki approval, although bots on wikis with such approval
processes could well use the title of the approval page as their task name.
What user agents do the other large editing frameworks use?
...
I can tell you AnomieBOT uses "AnomieBOT 1.0 ($TASKNAME; see
[[User:$USERNAME]])". Not sure if you consider it a large editing framework.
The task names the bot uses are generally listed on the bot's userpage;
various one-off scripts I use locally will use some ad-hoc identifier, or
"no task" if I forgot to have the script set a task name.
(I should change that to start with AnomieBOT/1.0 to comply with RFC 2616,
now that I think of it)
-- 
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] User agent policy for bots