Re: [Analytics] WikimediaBot convention

28 Jan 2016

On Thu, Jan 28, 2016 at 11:15 AM, Marcel Ruiz Forns
&lt;mforns(a)wikimedia.org&gt; wrote:
...
  Hi analytics list,

 In the past months the WikimediaBot convention has been mentioned in a
 couple threads, but we (Analytics team) never finished establishing and
 advertising it. In this email we explain what the convention is today and
 what purpose it serves. And also ask for feedback to be sure we can continue
 with the next steps.

 What is the WikimediaBot convention?
 It is a way of better identifying Wikimedia traffic originated by bots.
 Today we know that a significant share of Wikimedia traffic comes from bots.
 We can recognize a part of that traffic with regular expressions[1], but we
 can not recognize all of it, because some bots do not identify themselves as
 such. If we could identify a greater part of the bot traffic, we could also
 better isolate the human traffic and permit more accurate analyses.

 Who should follow the convention?
 Computer programs that access Wikimedia sites or the Wikimedia API for
 reading purposes* in a periodic, scheduled or automatically triggered way.

 Who should NOT follow the convention?
 Computer programs that follow the on-site ad-hoc commands of a human, like
 browsers. And well known spiders that are otherwise recognizable by their
 well known user-agent strings.

 How to follow the convention?
 The client's user-agent string should contain the word "WikimediaBot". The
 word can be anywhere within the user-agent string and is case-sensitive. 
This is useless unless someone is going to start blocking bots that
dont follow it.

There is an existing policy, which is not being followed / enforced.

https://meta.wikimedia.org/wiki/User-Agent_policy

It is also extremely annoying that clients (e.g. Pywikibot) now needs
to add a Wikimedia specific tag to their user-agent.  A user-agent
should be client specific, not server specific.  Why not just "Bot",
or "MediaWikiBot" which at least encompasses all sites that the client
can communicate with.

-- 
John Vandenberg

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] WikimediaBot convention