Thanks for this. Forwarding to Analytics and
Research for others who are
On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand(a)wikimedia.org>
This Tech Talk will be starting in 30 minuets.
On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand(a)wikimedia.org
> Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at
> time/5pm UTC
> for a 30 min tech talk. You can join our hangout or follow along on
> (please note that a link to join the hangout will be posted in the
> of this event just as it starts).
> You can follow ask questions on IRC during the talk in #wikimedia-dev.
> If you are not able to follow along live, a video recording will be
> to the MediaWiki YouTube channel immediately following the tech talk
> you to view at any time.
> More information about the tech talk:
> *Hadoop and Beyond. An overview of Analytics infrastructure*In this
> talk we will be presenting the analytics infrastructure that we have
> recently rolled out in production. By now probably everybody knows
> wikimedia hosts an instance of hadoop from which we are going to
> pageview data in the near future. But .. how exactly does the data get
> We will go over the path that webrequest log data takes from varnish
> kafka (a distributed log buffer) to hadoop and the challenges of
> this java-based infrastructure in production. We will also talk about
> can we query the data with hive, an SQL-like interface. How can you
> this stack on vagrant to play with and, last but not least, how we
> hive recently to provide GLAM folks with image view stats:
Wikitech-l mailing list
Wiki-research-l mailing list