Dear wiki-research-l and wiki-tech-l members, 
    
    Specifically on Nuria Ruiz and Andrew Otto's talk on July 15th on the NARA analytics pilot: Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot, I wonder whether it is possible to duplicate this for other GLAM institutions so as to expand its global GLAM outreach. 

     I plan to compare/contrast two to four GLAM institutions that host substantial Chinese collections in China, Taiwan, the U.K. I hope that it can be turned into a hackathon event to recruit coders and researchers from the Chinese-speaking regions.

     Another benefit to duplicate NARA analytics pilot is to demonstrate the possible data-research workflow using the data and infrastructure provided by the Wikimedia Foundation. 

     I am not sure if the following list contain all the tasks involved and the time needed to finish them (Please give me your estimate, THANKS):

# Identify images to be logged for visiting traffic.
# Get log data permission (Is it gonna be difficult?)
# Start logging 
# Visualize using glam-metrics (http://glam-metrics.wmflabs.org/)
# Localize the stats report with Chinese locale/translation
# Customize glam-metrics so that more than one GLAM institution can be compared.

    Please also let me know if I miss anything. Many thanks.

Best,


2014-07-18 8:28 GMT+01:00 Pine W <wiki.pine@gmail.com>:
Thanks for this. Forwarding to Analytics and Research for others who are curious.

Pine


On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand@wikimedia.org> wrote:
This Tech Talk will be starting in 30 minuets. Thanks!


On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand@wikimedia.org>
wrote:

> Hello!
>
> Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am SF
> time/5pm UTC
> <http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Talk&iso=20140715T10&p1=224&am=30>
> for a 30 min tech talk. You can join our hangout or follow along on
> youtube:
> https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd09a1c30rlrmg
> (please note that a link to join the hangout will be posted in the comments
> of this event just as it starts).
>
> You can follow ask questions on IRC during the talk in #wikimedia-dev.
>
> If you are not able to follow along live, a video recording will be posted
> here
> <https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/videos>,
> to the MediaWiki YouTube channel immediately following the tech talk for
> you to view at any time.
>
> More information about the tech talk:
>
> *Hadoop and Beyond. An overview of Analytics infrastructure*In this tech
> talk we will be presenting the analytics infrastructure that we have
> recently rolled out in production. By now probably everybody knows that
> wikimedia hosts an instance of hadoop from which we are going to extract
> pageview data in the near future. But .. how exactly does the data get
> there?
>
> We will go over the path that webrequest log data takes from varnish to
> kafka (a distributed log buffer) to hadoop and the challenges of deploying
> this java-based infrastructure in production. We will also talk about how
> can we query the data with hive, an SQL-like interface. How can you set up
> this stack on vagrant to play with and, last but not least, how we used
> hive recently to provide GLAM folks with image view stats:
> https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_analytics_pilot
>
> Thanks!
>
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l