Hi Dario,
On Thu, Dec 11, 2014 at 04:11:49PM -0800, Dario Taraborelli wrote:
> I am kicking off this thread [...]
Thanks!
> However, there are types of data quality issues that we only
> discover when collecting data at scale and in the wild (on
> browsers/platforms that we don’t necessarily test for internally).
Full ACK.
However, that sounds like we're only talking about schemas where the
collection code got tested using Vagrant or beta, and is known to work
on the relevant portion of the traffic.
And since you say that it's on browsers/platforms that we don't
necessarily test for internally, I assume we're actually talking only
about a small fraction of the traffic.
I assume that scope for the rest of the reply.
> is there a way to inspect invalid events in near real time without
> having access to vanadium?
* Urgent, ad-hoc needs
For urgent, ad-hoc needs, (which should happen really seldom, given
the scope), ping us in IRC in #wikimedia-analytics.
At least qchris, milimetric, and nuria should be able to ssh into
vanadium and can take a look right away.
If none of them are around, Ops of course have access to the relevant
files on vanadium [1]. And since we're in the case of urgent, ad-hoc
needs, I am sure they'd help out.
* Not so urgent needs
For not so urgent needs, since it's only a small fraction of the
traffic, I am not sure real-time need is worth it.
Sure it would be nice to provide near real-time access to those files,
but we should also get the cluster into a more reliable state,
implement UDFs for researches to make their lives easier, and get the
data-warehouse up and running ;-)
But I see that meanwhile a Phabricator task got added, and I guess I
am alone with my judgement :-)
Have fun,
Christian
[1] Either
/srv/log/eventlogging/client-side-events.log
or
/srv/log/eventlogging/server-side-events.log
depending on the kind of event you're looking for.
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at
4293 Gutau, Austria Phone: +43 7946 / 20 5 81
Fax: +43 7946 / 20 5 81
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics