Re: [Wikitech-l] Small Amendment to User-Agent Policy

22 Mar 2016


      On Tue, Mar 22, 2016 at 12:44 AM, Marcel Ruiz Forns
mforns@wikimedia.org wrote:
...
...
I think adding the word bot to the user-agent of bot-like programs is a
widely adopted convention. Actually, the word bot is already (for a long
time now) being parsed and used to tag requests as bot-originated in our
jobs that process requests into pageviews stats, because many external bots
include it in their user-agent. See:
http://www.useragentstring.com/pages/Crawlerlist/
The algorithm has been imperfect for a long time.  How long and how
imperfect doesnt matter.  Analytics is all about making good use of
imperfect algorithms to provide reasonable approximations.
However, I expect it is the role of Analytics is to improve the
definitions and implementation over time, not force a bad algorithms
into policy.
Pywiki*bot* has the string 'bot' in its useragent, because it is part
of the product name.
However, not all usage of Pywikibot is a crawler or even a bot, in any
sensible definition of those concepts.
Pywikibot is a *user agent* that knows how to be a client of the
*MediaWiki API*.  It can be used for "in-situ human consumption" or
not.
It is no different from a web browser in how it *may* be used,
although of course typically the primary goal of using Pywikibot
instead of a Web browser is to reduce the amount of human consumption
and decision making needed to perform a task.  But that is no
different to Gadgets written using the JavaScript libraries that run
in the Web browser.
It can function *exactly* like a web browser reading a special:search
results page, viewing some of those page in the search results, and
making edits to some of them.  Each page may be viewed by a real
human, who is making decisions throughout the entire process about
which pages to view and which pages to edit.
Or it can function *exactly* like a crawler, spider, bot, etc., with
zero human consumption.
Almost every script that is packaged with Pywikibot has an automatic
and non-automatic mode of operation.
Should we change our user-agent to "Pywikihuman" when in non-automatic
mode of operation, so that it isnt considered to be a bot by
Analytics?
Using the string 'bot' in the user-agent may be a useful approximation
for Analytics to use circa 2010, but it is bad policy, and Analytics
can and should do much better than that in 2016 now that API usage is
in focus.
...
There is very little information at
...
https://meta.wikimedia.org/wiki/Research:Page_view or elsewhere (that
I can see) regarding what use of the API is considered to be a
**page** view.  For example, is it a page view when I ask the API for
metadata only of the last revision of a page -- i.e. the page/revision
text is not included in the response?
You're right, and this is a very good question. I fear the only ways to
look into this are browsing the actual code in:
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-...
I am not very interested in the code, which is at best an attempt at
implementing the API page view definition.  I'd like to understand the
high level goal.
However, having read that file, and the accompanying test suite, it is
my understanding that there is no definition of an API page view.
i.e. all requests to api.php , excepting api.php usage by the
Wikipedia App (i.e. with user-agent "WikipediaApp", used by the iOS
and Android Apps), is classified as *not a page view*.
fwiw, rather than reading the source, this test data file with
expected results is a simpler way to see the current status.
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-...
...
or asking the Research team, who owns the definition.
Could the Research team please publish their definition of API
(api.php) page views, like they do for Web (index.php) page views.
Without this, it is hard to have a serious conversation about how
changing the user-agent policy might be helpful to achieve the goal of
better classifying API page views.
--
John Vandenberg

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Small Amendment to User-Agent Policy