I just want to confirm that the proposed method using Eventlogging will
allow us to gather data in a similar fashion to the web request table. In
particular, will we be able to sort by country, OS, Browser, etc? Our goal
here is to be able to consider the new page interactions metric on the same
level and with the same depth as pageviews.
On Thu, Jan 18, 2018 at 12:46 PM Andrew Otto <otto(a)wikimedia.org> wrote:
> the beacon puts the record into the
webrequest table and from there
it would only take some trivial preprocessing
‘Trivial’ preprocessing that has to look through 150K requests per
second! This is a lot of work!
> tracking of events is better done on an event based system and EL is
such a system.
I agree with this too. We really want to discourage people from trying
to measure things by searching through the huge haystack of all
webrequests. To measure something, you should emit an event if you can.
If it were practical, I’d prefer that we did this for pageviews as well.
Currently, we need a complicated definition of what a pageview is, which
really only exists in the Java implementation in the Hadoop cluster. It’d
be much clearer if app developers had a way to define themselves what
counts as a pageview, and emit that as an event.
This should be the approach that people take when they want to measure
something new. Emit an event! This event will get its own Kafka topic
(you can consume this to do whatever you like with it), and be refined into
its own Hive table.
> I don’t want to have to create that chart and export one dataset
from pageviews and one dataset from eventlogging to do that.
If you also design your schema nicely
it will be easily importable into Druid and usable in Pivot and Superset,
alongside of pageviews. We’re working on getting nice schemas automatically
imported into druid <https://gerrit.wikimedia.org/r/#/c/386882/>.
On Thu, Jan 18, 2018 at 11:16 AM, Nuria Ruiz <nuria(a)wikimedia.org>
> >while EventLogging data gets stored in a different, unrelated way
> Not really, This has changed quite a bit as of the last two quarters.
> Eventlogging data as of recent gets preprocessed and refined similar to how
> webrequest data is preprocessed and refined. You can have a dashboard on
> top of some eventlogging schemas on superset in the same way you have a
> dashboard that displays pageview data on superset.
> See dashboards on superset (user required).
> And (again, user required) EL data on druid, this very same data we
> are talking about, page previews:
> >I was going to make the point that #2 already has a processing
> pipeline established whereas #1 doesn't.
> This is incorrect, we mark as "preview" data that we want to exclude
> from processing, see:
> Naming is unfortunate but previews are really "preloads" as in
> requests we make (and cache locally) and maybe shown to users or not.
> But again, tracking of events is better done on an event based system
> and EL is such a system.
> Analytics mailing list
Analytics mailing list
Analytics mailing list