On Mon, Dec 21, 2015 at 5:15 PM, John Mark Vandenberg jayvdb@gmail.com wrote:
On Tue, Dec 15, 2015 at 10:51 AM, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
+1 Oliver - User agents tagged with WikimediaBot are tagged as bot - I do agree that our documentation on this can be approved, I'll update the Webrequest and Pageview tables docs to reflect this.
Where was this announced? I don't believe pywikibot does this, or was notified that it should do this...?
Apologies, it wasn't. Here is a task for it -
https://phabricator.wikimedia.org/T108599, and it's in our pipeline to get done.
Are accounts with the bot flag also tagged as bot?
I believe bot flags associated with accounts are not part of the
webrequest data, so we don't look at it. Currently, we use UA-parser + some custom regex https://github.com/wikimedia/analytics-refinery-source/blob/c7f1973053122476b6297d373d49105ec08285e9/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L56 to identify and mark spiders. So if you have not adopted the WikimediaBot convention, your bot would be currently tagged as a spider if the UA matched this regex. Only those bots that explicitly tag with WikimediaBot will register as a bot.
--
John Vandenberg
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I have also added notes to https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly and https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest noting this 'bot' agent-type.
--Madhu :)