On Thu, Jan 18, 2018 at 10:45 AM, Andrew Otto <otto@wikimedia.org> wrote:
the beacon puts the record into the webrequest table and from there it would only take some trivial preprocessing
‘Trivial’ preprocessing that has to look through 150K requests per second! This is a lot of work!
I think Gergo may have been referring to the human work involved in implementing that preprocessing step. I assume it could be quite analogous to the one your team has implemented for pageviews: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/pageview/hourly/pageview_hourly.hql

Are you saying that the server load generated by such an additional aggregation query would be a blocker? If yes, how about we combine the two (for pageviews and previews) into one?

Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB