(I'd defer to the Readers Web team with Tilman on whether country extracted from the cookie would be sufficient.)
Adding to this, one thing to consider is DNT - is there a way to invoke EL so that such traffic is appropriately imputed or something?
-Adam
On Thu, Jan 18, 2018 at 2:13 PM, Andrew Otto otto@wikimedia.org wrote:
In particular, will we be able to sort by country, OS, Browser, etc?
OS, Browser, yes. User Agent parsing is done by the EventLogging processors.
Country not quite as easily, as EventLogging does not include client IP addresses. We could consider putting this back in somehow, or, I’ve also heard that there is a geocoded country cookie that varnish will set that the browser could send back as part of the event. Is country enough geo detail?
On Thu, Jan 18, 2018 at 2:30 PM, Olga Vasileva ovasileva@wikimedia.org wrote:
Hi all,
I just want to confirm that the proposed method using Eventlogging will allow us to gather data in a similar fashion to the web request table. In particular, will we be able to sort by country, OS, Browser, etc? Our goal here is to be able to consider the new page interactions metric on the same level and with the same depth as pageviews.
Thanks!
- Olga
On Thu, Jan 18, 2018 at 12:46 PM Andrew Otto otto@wikimedia.org wrote:
the beacon puts the record into the webrequest table and from there
it would only take some trivial preprocessing ‘Trivial’ preprocessing that has to look through 150K requests per second! This is a lot of work!
tracking of events is better done on an event based system and EL is
such a system. I agree with this too. We really want to discourage people from trying to measure things by searching through the huge haystack of all webrequests. To measure something, you should emit an event if you can. If it were practical, I’d prefer that we did this for pageviews as well. Currently, we need a complicated definition of what a pageview is, which really only exists in the Java implementation in the Hadoop cluster. It’d be much clearer if app developers had a way to define themselves what counts as a pageview, and emit that as an event.
This should be the approach that people take when they want to measure something new. Emit an event! This event will get its own Kafka topic (you can consume this to do whatever you like with it), and be refined into its own Hive table.
I don’t want to have to create that chart and export one dataset
from pageviews and one dataset from eventlogging to do that. If you also design your schema nicely https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines, it will be easily importable into Druid and usable in Pivot and Superset, alongside of pageviews. We’re working on getting nice schemas automatically imported into druid https://gerrit.wikimedia.org/r/#/c/386882/.
On Thu, Jan 18, 2018 at 11:16 AM, Nuria Ruiz nuria@wikimedia.org wrote:
Gergo,
while EventLogging data gets stored in a different, unrelated way
Not really, This has changed quite a bit as of the last two quarters. Eventlogging data as of recent gets preprocessed and refined similar to how webrequest data is preprocessed and refined. You can have a dashboard on top of some eventlogging schemas on superset in the same way you have a dashboard that displays pageview data on superset.
See dashboards on superset (user required).
https://superset.wikimedia.org/superset/dashboard/7/?presele ct_filters=%7B%7D
And (again, user required) EL data on druid, this very same data we are talking about, page previews:
https://pivot.wikimedia.org/#tbayer_popups
I was going to make the point that #2 already has a processing
pipeline established whereas #1 doesn't. This is incorrect, we mark as "preview" data that we want to exclude from processing, see: https://github.com/wikimedia/analytics-refinery-source/blob/ master/refinery-core/src/main/java/org/wikimedia/analytics/ refinery/core/PageviewDefinition.java#L144 Naming is unfortunate but previews are really "preloads" as in requests we make (and cache locally) and maybe shown to users or not.
But again, tracking of events is better done on an event based system and EL is such a system.
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Olga Vasileva // Product Manager // Reading Web Team https://wikimediafoundation.org/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics