Re: [Analytics] Records of article access

20 Oct 2014

Makes sense. Yeah, I had a "assuming everyone knows what you know" moment;
I appreciate the automated query logging may not be a known thing (for the
reasons Jeremy sets out, it's currently accessible only via an internal
proxy, which makes it a wee bit difficult for people to know that it exists
;p). Sorry about that.

We could probably do it via Hadoop (it'd be a lot easier to automate!) if
we come up with some useful heuristics for what automated activity looks
like. I'm hoping that the spider/bot/automation identification as part of
the pageviews definition will give us some of that.

On 20 October 2014 13:50, Jeremy Baron &lt;jeremy(a)tuxmachine.com&gt; wrote:

...
  On Oct 20, 2014 1:36 PM, "Oliver Keyes"
&lt;okeyes(a)wikimedia.org&gt; wrote:
  I guess mostly I'm just confused as to what
you'd add on top of "SSH  keys, automated logging and transparent
documentation".

 I *think* Pine was asking for automatic query logging similar to what
 you've just said is already happening.

 Eventually maybe we'll get these types of queries mostly running on
 hadoop+M/R. (vs. processing a local file on disk) We could publish public
 logs of M/R jobs and for some of them allow public download of the output.
 (but this particular query would not allow public downloading of the output
 because IP/UA string/etc.)

 -Jeremy

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Records of article access