> the beacon puts the record into the webrequest table and from there it would only take some trivial preprocessing

‘Trivial’ preprocessing that has to look through 150K requests per second! This is a lot of work!

> tracking of events is better done on an event based system and EL is such a system.

I agree with this too. We really want to discourage people from trying to measure things by searching through the huge haystack of all webrequests. To measure something, you should emit an event if you can. If it were practical, I’d prefer that we did this for pageviews as well. Currently, we need a complicated definition of what a pageview is, which really only exists in the Java implementation in the Hadoop cluster. It’d be much clearer if app developers had a way to define themselves what counts as a pageview, and emit that as an event.

This should be the approach that people take when they want to measure something new. Emit an event! This event will get its own Kafka topic (you can consume this to do whatever you like with it), and be refined into its own Hive table.

> I don’t want to have to create that chart and export one dataset from pageviews and one dataset from eventlogging to do that.

If you also design your schema nicely, it will be easily importable into Druid and usable in Pivot and Superset, alongside of pageviews. We’re working on getting nice schemas automatically imported into druid.

On Thu, Jan 18, 2018 at 11:16 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:

Gergo,

>while EventLogging data gets stored in a different, unrelated way
Not really, This has changed quite a bit as of the last two quarters. Eventlogging data as of recent gets preprocessed and refined similar to how webrequest data is preprocessed and refined. You can have a dashboard on top of some eventlogging schemas on superset in the same way you have a dashboard that displays pageview data on superset.

See dashboards on superset (user required).

https://superset.wikimedia.org/superset/dashboard/7/?preselect_filters=%7B%7D

And (again, user required) EL data on druid, this very same data we are talking about, page previews:

https://pivot.wikimedia.org/#tbayer_popups

>I was going to make the point that #2 already has a processing pipeline established whereas #1 doesn't.
This is incorrect, we mark as "preview" data that we want to exclude from processing, see:
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/PageviewDefinition.java#L144
Naming is unfortunate but previews are really "preloads" as in requests we make (and cache locally) and maybe shown to users or not.

But again, tracking of events is better done on an event based system and EL is such a system.

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics