On Thu, Jan 18, 2018 at 10:45 AM, Andrew Otto <otto(a)wikimedia.org> wrote:
the beacon
puts the record into the webrequest table and from there it
would only take some
trivial preprocessing
‘Trivial’ preprocessing that has to look through 150K requests per second!
This is a lot of work!
I think Gergo may have been referring to the human work involved in
implementing that preprocessing step. I assume it could be quite analogous
to the one your team has implemented for pageviews:
https://github.com/
wikimedia/analytics-refinery/blob/master/oozie/pageview/
hourly/pageview_hourly.hql
Are you saying that the server load generated by such an additional
aggregation query would be a blocker? If yes, how about we combine the two
(for pageviews and previews) into one?
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB