Re: [Analytics] Records of article access

20 Oct 2014

I think we are now all getting on the same wavelength.

The one piece of this puzzle that I am still missing is understanding how
it seems like this traffic research for the Signpost was a surprise to Toby
and he was thinking that it would benefit from Legal's input, because if
the queries were being logged then I would have thought Toby would be aware
of them because he would see them in the logs, and I would think that he
and others would be regularly checking the logs to make sure that all
accesses look normal. Toby, can you comment on that, and also clarify what
part of this you are thinking will benefit from Legal's input?

Thanks,

Pine

*This is an Encyclopedia* <https://www.wikipedia.org/>

*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*

*—Catherine Munro*

On Mon, Oct 20, 2014 at 10:53 AM, Oliver Keyes &lt;okeyes(a)wikimedia.org&gt; wrote:

...
  Makes sense. Yeah, I had a "assuming everyone
knows what you know" moment;
 I appreciate the automated query logging may not be a known thing (for the
 reasons Jeremy sets out, it's currently accessible only via an internal
 proxy, which makes it a wee bit difficult for people to know that it exists
 ;p). Sorry about that.

 We could probably do it via Hadoop (it'd be a lot easier to automate!) if
 we come up with some useful heuristics for what automated activity looks
 like. I'm hoping that the spider/bot/automation identification as part of
 the pageviews definition will give us some of that.

 On 20 October 2014 13:50, Jeremy Baron &lt;jeremy(a)tuxmachine.com&gt; wrote:

  On Oct 20, 2014 1:36 PM, "Oliver Keyes"
&lt;okeyes(a)wikimedia.org&gt; wrote:
  I guess mostly I'm just confused as to what
you'd add on top of "SSH  keys, automated logging and transparent
documentation".

 I *think* Pine was asking for automatic query logging similar to what
 you've just said is already happening.

 Eventually maybe we'll get these types of queries mostly running on
 hadoop+M/R. (vs. processing a local file on disk) We could publish public
 logs of M/R jobs and for some of them allow public download of the output.
 (but this particular query would not allow public downloading of the output
 because IP/UA string/etc.)

 -Jeremy

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Records of article access