I think we are now all getting on the same wavelength.
The one piece of this puzzle that I am still missing is understanding how
it seems like this traffic research for the Signpost was a surprise to Toby
and he was thinking that it would benefit from Legal's input, because if
the queries were being logged then I would have thought Toby would be aware
of them because he would see them in the logs, and I would think that he
and others would be regularly checking the logs to make sure that all
accesses look normal. Toby, can you comment on that, and also clarify what
part of this you are thinking will benefit from Legal's input?
Thanks,
Pine
*This is an Encyclopedia* <https://www.wikipedia.org/>
*One gateway to the wide garden of knowledge, where lies The deep rock of
our past, in which we must delve The well of our future,The clear water we
must leave untainted for those who come after us,The fertile earth, in
which truth may grow in bright places, tended by many hands,And the broad
fall of sunshine, warming our first steps toward knowing how much we do not
know.*
*—Catherine Munro*
On Mon, Oct 20, 2014 at 10:53 AM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
Makes sense. Yeah, I had a "assuming everyone
knows what you know" moment;
I appreciate the automated query logging may not be a known thing (for the
reasons Jeremy sets out, it's currently accessible only via an internal
proxy, which makes it a wee bit difficult for people to know that it exists
;p). Sorry about that.
We could probably do it via Hadoop (it'd be a lot easier to automate!) if
we come up with some useful heuristics for what automated activity looks
like. I'm hoping that the spider/bot/automation identification as part of
the pageviews definition will give us some of that.
On 20 October 2014 13:50, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
On Oct 20, 2014 1:36 PM, "Oliver Keyes"
<okeyes(a)wikimedia.org> wrote:
I guess mostly I'm just confused as to what
you'd add on top of "SSH
keys, automated logging and transparent
documentation".
I *think* Pine was asking for automatic query logging similar to what
you've just said is already happening.
Eventually maybe we'll get these types of queries mostly running on
hadoop+M/R. (vs. processing a local file on disk) We could publish public
logs of M/R jobs and for some of them allow public download of the output.
(but this particular query would not allow public downloading of the output
because IP/UA string/etc.)
-Jeremy
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics