Note this reply represents my own views, but does not represent an official WMF position.
On Sun, Jul 13, 2014 at 4:25 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
It would be good to know the answer to whether the username is logged against API requests. It seems like a very important piece of information which should be visible in server ops logging of API usage.
The API request log does record usernames. And doesn't contain user agents, for that matter.
But my guess is that at least some of the types of problems Ops would be concerned with are in different log files that probably do not contain usernames but do contain user agents.
username is easy, if it is needed.
I would include username. The only harm is a few extra bytes per request.
pywiki requiring bot operators provide an email address is technically easy, but I suspect it isnt going to be very successful or appreciated, esp for non-SSL wikis, or understood as pywiki hasnt put this info in the user-agent since the new user-agent policy was introduced, so why now?
I don't see any particular need for email addresses if the on-wiki username is provided. The key is "some method of contact".
If the main source of problems is the 'large' bots, they usually run many tasks, and it is likely to only be a single task causing problems. With these large tasks, ideally they are paused rather than blocked, in which case we need to introduce a standardised way to pause a bot. In these cases, the user agent could mention the task identifier, and that identifier could be used to pause it until an operator has checked their email. The 'pause' command interface could be IRC or user_talk, or something new based on Flow, or a API response warning like replag which pywikibot honours. I appreciate BinĂ¡ris' point that some (most?) wikis, especially smaller wikis, do not have 'task approval' processes with a task identifier, so this would need to be optional. Large bot operators would use this feature if it meant that only a single task is paused rather than the bot account blocked.
For the normal usage of pywikibot, being invoking an existing script which is maintained by pywikibot, we could include in the user-agent which script is running (e.g. move.py).
Including the "task name", which for pywikibot could be the script name, seems sensible to me. Besides the stated distinguishing which script in a multi-task bot is problematic, it would also help in determining that multiple accounts/IPs are running the same problematic script.
I wouldn't go as far as requiring the task name to correspond to any particular on-wiki approval, although bots on wikis with such approval processes could well use the title of the approval page as their task name.
What user agents do the other large editing frameworks use?
I can tell you AnomieBOT uses "AnomieBOT 1.0 ($TASKNAME; see [[User:$USERNAME]])". Not sure if you consider it a large editing framework.
The task names the bot uses are generally listed on the bot's userpage; various one-off scripts I use locally will use some ad-hoc identifier, or "no task" if I forgot to have the script set a task name.
(I should change that to start with AnomieBOT/1.0 to comply with RFC 2616, now that I think of it)