Team:
As of today incoming request data includes an extra bit of information on the X-analytics header.
If an incoming request to any wikipedia project had no cookies whatsoever it will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies disabled or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
What was the motivation for this change? Just looking for possible automata?
On 21 October 2015 at 15:38, Nuria Ruiz nuria@wikimedia.org wrote:
Team:
As of today incoming request data includes an extra bit of information on the X-analytics header.
If an incoming request to any wikipedia project had no cookies whatsoever it will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies disabled or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
What was the motivation for this change? Just looking for possible
automata? Right.The motivation was to see if the absence of cookies works as a cheap proxy to identify robots. It is a pretty easy change to make that might help us quite a bit, we shall update the list when we have some data.
On Wed, Oct 21, 2015 at 12:47 PM, Oliver Keyes okeyes@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible automata?
On 21 October 2015 at 15:38, Nuria Ruiz nuria@wikimedia.org wrote:
Team:
As of today incoming request data includes an extra bit of information on the X-analytics header.
If an incoming request to any wikipedia project had no cookies
whatsoever it
will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies
disabled
or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Awesome; cool!
On 21 October 2015 at 15:49, Nuria Ruiz nuria@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible automata?
Right.The motivation was to see if the absence of cookies works as a cheap proxy to identify robots. It is a pretty easy change to make that might help us quite a bit, we shall update the list when we have some data.
On Wed, Oct 21, 2015 at 12:47 PM, Oliver Keyes okeyes@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible automata?
On 21 October 2015 at 15:38, Nuria Ruiz nuria@wikimedia.org wrote:
Team:
As of today incoming request data includes an extra bit of information on the X-analytics header.
If an incoming request to any wikipedia project had no cookies whatsoever it will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies disabled or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I was under the impression that most of the MediaWiki bot frameworks do accept cookies, but I imagine many of the home-made bots don't. Seems like using an unknown useragent string might be a better proxy.
On Wed, Oct 21, 2015 at 1:49 PM, Nuria Ruiz nuria@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible
automata? Right.The motivation was to see if the absence of cookies works as a cheap proxy to identify robots. It is a pretty easy change to make that might help us quite a bit, we shall update the list when we have some data.
On Wed, Oct 21, 2015 at 12:47 PM, Oliver Keyes okeyes@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible automata?
On 21 October 2015 at 15:38, Nuria Ruiz nuria@wikimedia.org wrote:
Team:
As of today incoming request data includes an extra bit of information
on
the X-analytics header.
If an incoming request to any wikipedia project had no cookies
whatsoever it
will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies
disabled
or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Seems like using an unknown useragent string might be a better proxy.
We already do bot filtering using user agent strings. Please see: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-...
On Wed, Oct 21, 2015 at 12:58 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
I was under the impression that most of the MediaWiki bot frameworks do accept cookies, but I imagine many of the home-made bots don't. Seems like using an unknown useragent string might be a better proxy.
On Wed, Oct 21, 2015 at 1:49 PM, Nuria Ruiz nuria@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible
automata? Right.The motivation was to see if the absence of cookies works as a cheap proxy to identify robots. It is a pretty easy change to make that might help us quite a bit, we shall update the list when we have some data.
On Wed, Oct 21, 2015 at 12:47 PM, Oliver Keyes okeyes@wikimedia.org wrote:
What was the motivation for this change? Just looking for possible automata?
On 21 October 2015 at 15:38, Nuria Ruiz nuria@wikimedia.org wrote:
Team:
As of today incoming request data includes an extra bit of information
on
the X-analytics header.
If an incoming request to any wikipedia project had no cookies
whatsoever it
will be tagged with nocookie=1. A requests without any cookies could correspond to a fresh browser session, a user browsing with cookies
disabled
or, most likely, a bot request as most bots will not accept cookies. We *might* be able to use this setting as a cheap proxy to quantify bot traffic.
Documentation about this change can be found here: https://wikitech.wikimedia.org/wiki/X-Analytics
Thanks,
Nuria
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics