Thanks for this. Forwarding to Analytics and Research for others who are curious.
Pine
On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand rfarrand@wikimedia.org wrote:
This Tech Talk will be starting in 30 minuets. Thanks!
On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand rfarrand@wikimedia.org wrote:
Hello!
Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am SF time/5pm UTC <
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Talk...
for a 30 min tech talk. You can join our hangout or follow along on youtube:
https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd09...
(please note that a link to join the hangout will be posted in the
comments
of this event just as it starts).
You can follow ask questions on IRC during the talk in #wikimedia-dev.
If you are not able to follow along live, a video recording will be
posted
here <
https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/vi...
, to the MediaWiki YouTube channel immediately following the tech talk for you to view at any time.
More information about the tech talk:
*Hadoop and Beyond. An overview of Analytics infrastructure*In this tech talk we will be presenting the analytics infrastructure that we have recently rolled out in production. By now probably everybody knows that wikimedia hosts an instance of hadoop from which we are going to extract pageview data in the near future. But .. how exactly does the data get there?
We will go over the path that webrequest log data takes from varnish to kafka (a distributed log buffer) to hadoop and the challenges of
deploying
this java-based infrastructure in production. We will also talk about how can we query the data with hive, an SQL-like interface. How can you set
up
this stack on vagrant to play with and, last but not least, how we used hive recently to provide GLAM folks with image view stats:
https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_ana...
Thanks!
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Dear wiki-research-l and wiki-tech-l members,
Specifically on Nuria Ruiz and Andrew Otto's talk on July 15th on the NARA analytics pilot: Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot, I wonder whether it is possible to duplicate this for other GLAM institutions so as to expand its global GLAM outreach.
I plan to compare/contrast two to four GLAM institutions that host substantial Chinese collections in China, Taiwan, the U.K. I hope that it can be turned into a hackathon event to recruit coders and researchers from the Chinese-speaking regions.
Another benefit to duplicate NARA analytics pilot is to demonstrate the possible data-research workflow using the data and infrastructure provided by the Wikimedia Foundation.
I am not sure if the following list contain all the tasks involved and the time needed to finish them (Please give me your estimate, THANKS):
# Identify images to be logged for visiting traffic. # Get log data permission (Is it gonna be difficult?) # Start logging # Visualize using glam-metrics (http://glam-metrics.wmflabs.org/) # Localize the stats report with Chinese locale/translation # Customize glam-metrics so that more than one GLAM institution can be compared.
Please also let me know if I miss anything. Many thanks.
Best,
2014-07-18 8:28 GMT+01:00 Pine W wiki.pine@gmail.com:
Thanks for this. Forwarding to Analytics and Research for others who are curious.
Pine
On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand rfarrand@wikimedia.org wrote:
This Tech Talk will be starting in 30 minuets. Thanks!
On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand rfarrand@wikimedia.org wrote:
Hello!
Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am
SF
time/5pm UTC <
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Talk...
for a 30 min tech talk. You can join our hangout or follow along on youtube:
https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd09...
(please note that a link to join the hangout will be posted in the
comments
of this event just as it starts).
You can follow ask questions on IRC during the talk in #wikimedia-dev.
If you are not able to follow along live, a video recording will be
posted
here <
https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/vi...
, to the MediaWiki YouTube channel immediately following the tech talk for you to view at any time.
More information about the tech talk:
*Hadoop and Beyond. An overview of Analytics infrastructure*In this tech talk we will be presenting the analytics infrastructure that we have recently rolled out in production. By now probably everybody knows that wikimedia hosts an instance of hadoop from which we are going to extract pageview data in the near future. But .. how exactly does the data get there?
We will go over the path that webrequest log data takes from varnish to kafka (a distributed log buffer) to hadoop and the challenges of
deploying
this java-based infrastructure in production. We will also talk about
how
can we query the data with hive, an SQL-like interface. How can you set
up
this stack on vagrant to play with and, last but not least, how we used hive recently to provide GLAM folks with image view stats:
https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_ana...
Thanks!
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org