Thanks for the information Oliver.
Hi John -- I just wanted to point out in a friendly way that your original email would have been just as effective if you had omitted the last line about a waste of effort to build. We always like to get feedback and questions from the community but the analytics team works hard to make good decisions and use donor money wisely. I'd love to see more constructive language on these lists.
Warmly,
-Toby
On Tue, Dec 22, 2015 at 12:30 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 21 December 2015 at 21:00, John Mark Vandenberg jayvdb@gmail.com wrote:
On Tue, Dec 22, 2015 at 12:23 PM, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
On Mon, Dec 21, 2015 at 5:15 PM, John Mark Vandenberg <jayvdb@gmail.com
wrote:
On Tue, Dec 15, 2015 at 10:51 AM, Madhumitha Viswanathan mviswanathan@wikimedia.org wrote:
+1 Oliver - User agents tagged with WikimediaBot are tagged as bot -
I
do agree that our documentation on this can be approved, I'll update the Webrequest and Pageview tables docs to reflect this.
Where was this announced? I don't believe pywikibot does this, or was notified that it should do this...?
Apologies, it wasn't. Here is a task for it - https://phabricator.wikimedia.org/T108599, and it's in our pipeline to
get
done.
Are accounts with the bot flag also tagged as bot?
I believe bot flags associated with accounts are not part of the
webrequest
data, so we don't look at it.
There is a bot request parameter associated with many write actions, and there is assert=bot available for all API requests since 1.23 (and earlier with Extension:AssertEdit) See https://www.mediawiki.org/wiki/API:Assert .
Why cant those be used? They are validated data.
Because many "bot" requests go nowhere near the API, because almost no "pageviews" go near the API, because Assert is designed exclusively for logged-in API requests, which most API requests are not, because Assert is designed (primarily) for edits, which no pageviews are.
user-agent with 'WikimediaBot' is not validated data; anyone can change the user-agent and it magically becomes a bot? That sounds like a way to ensure this data is not reliable and a waste of effort to build.
Anyone can change the user-agent and it magically becomes considered automated software, yes. This is absolutely no different from the moment, where anyone can change their user agent to say, the GoogleBot user agent and also becomes considered automated software. The vast vast vast majority of actual human users never do this, and those that do tend not to be interested in distorting our automata statistics but instead not providing a consistent user agent for privacy purposes, in which case they use browser extensions to roll between an array of actual human UAs. There's no real incentive to roll between automata UAs because some sites restrict the features you can or can't access (for example: not providing JavaScript) if they think you're a crawler. As of yesterday, I have been handling the raw webrequest logs for 2 solid years and in that time the number of obviously-human "automata" I've seen has been minuscule.
-- John Vandenberg
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics