> For example, UI instrumentations on the web are almost always sampled, because that yields enough data to answer UI questions - but on the other hand tend to record much more detail about the individual interaction. In contrast, we register all pageviews unsampled, but don't keep a permanent record of every single one of them with precise timestamps - rather, we have aggregated tables (pageview_hourly in particular). Our EventLogging backend is not tailored to that.
When you say “Our EventLogging backend here”, what are you referring to? If MySQL, then for sure. :)
> Storing data about seen previews in the same way as we do for pageviews, for example in the pageview_hourly (suitably tagged, perhaps giving that table a more general name) would facilitate that a lot, by allowing us to largely reuse the work that during the past few years went into getting pageview aggregation right.
I’m not totally opposed to doing it this way, but at some point we need to realize that this isn’t a scalable (human and CPU resource wise) way to measure user feature interaction.
I don’t think a pageview is inherently different than any other kind of impression, it’s just that we didn’t have the ability in the past (or now?) for pageviews to be collected and measured like they should. If we were designing an interaction measurement system now, it wouldn’t look exactly like EventLogging, but it would look like something close to it. And if it did everything I’d want it to, we would use it to measure pageviews and everything else you’ve mentioned.
Making events be the source of truth is more accurate than implementing custom batch logic in Hadoop to comb through webrequests and filter out what you are looking for. It pushes control of the definition of what counts as a ‘pageview’ or ‘page preview’ to the folks who are developing the app/website/feature. If we use webrequests+Hadoop tagging to count these, any time in the future there is a change to the URLs that page previews load (or the beacon URLs they hit), we’d have to make a patch to the tagging logic and release and deploy a new refinery version to account for the change. Any time a new feature is added for which someone wants interactions counted, we have to do the same.
Heck, if you use events, you could very easily consume and/or aggregate or emit them to anywhere you wanted. Your own datastore, a grafana dashboard, a monitoring system, etc. etc. :) It also will help us to standardize this type of thing, so that in the future creation of new dashboards can be more automated.