Just a quick note to clarify that this change only filters out bots whose requests carry a user agent string that identifies them as such. You can track our tasks related to identifying nonevident bots in Phabricator task T138207 https://phabricator.wikimedia.org/T138207.
Thanks!
On Wed, May 24, 2017 at 6:35 PM, Jon Katz jkatz@wikimedia.org wrote:
Nice change! Thanks.
On Wed, May 24, 2017 at 8:05 AM, Tilman Bayer tbayer@wikimedia.org wrote:
Thanks Francisco! To express it from the perspective of users of this data: The results of your EventlLogging queries may change slightly, but for the better, improving accuracy. (In the past e.g. GoogleBot has shown up in schemas for mobile web and the Android Wikipedia app.)
On Wed, May 24, 2017 at 4:54 AM, Francisco Dans fdans@wikimedia.org wrote:
Hi all,
Today we'll be deploying a change that affects how events triggered by bots/spiders are stored. We have added a property to the user agent map in the event capsule called *is_bot, *which we use to prevent them from being persisted in MySQL, and store them only in Hadoop.
For more information on this change refer to phab task T67508 https://phabricator.wikimedia.org/T67508.
Thank you!
-- *Francisco Dans* Software Engineer, Analytics Team Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics