A user-agent should be client specific, not server
specific.
This makes a lot of sense. If I build a bot that crawls wikipedia and
facebook public pages it really doesn't make sense that my bot has a
"wikimediaBot" user agent, just the word "Bot" should probably be
enough.
On Wed, Jan 27, 2016 at 8:47 PM, John Mark Vandenberg <jayvdb(a)gmail.com>
wrote:
On Thu, Jan 28, 2016 at 11:15 AM, Marcel Ruiz Forns
<mforns(a)wikimedia.org> wrote:
Hi analytics list,
In the past months the WikimediaBot convention has been mentioned in a
couple threads, but we (Analytics team) never finished establishing and
advertising it. In this email we explain what the convention is today and
what purpose it serves. And also ask for feedback to be sure we can
continue
with the next steps.
What is the WikimediaBot convention?
It is a way of better identifying Wikimedia traffic originated by bots.
Today we know that a significant share of Wikimedia traffic comes from
bots.
We can recognize a part of that traffic with
regular expressions[1], but
we
can not recognize all of it, because some bots do
not identify
themselves as
such. If we could identify a greater part of the
bot traffic, we could
also
better isolate the human traffic and permit more
accurate analyses.
Who should follow the convention?
Computer programs that access Wikimedia sites or the Wikimedia API for
reading purposes* in a periodic, scheduled or automatically triggered
way.
Who should NOT follow the convention?
Computer programs that follow the on-site ad-hoc commands of a human,
like
browsers. And well known spiders that are
otherwise recognizable by their
well known user-agent strings.
How to follow the convention?
The client's user-agent string should contain the word "WikimediaBot".
The
word can be anywhere within the user-agent string
and is case-sensitive.
This is useless unless someone is going to start blocking bots that
dont follow it.
There is an existing policy, which is not being followed / enforced.
https://meta.wikimedia.org/wiki/User-Agent_policy
It is also extremely annoying that clients (e.g. Pywikibot) now needs
to add a Wikimedia specific tag to their user-agent. A user-agent
should be client specific, not server specific. Why not just "Bot",
or "MediaWikiBot" which at least encompasses all sites that the client
can communicate with.
--
John Vandenberg
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics