Analytics September 2013

analytics@lists.wikimedia.org

34 participants
28 discussions

Wikimetrics feedback
by Maarten Dammers 07 Sep '13

07 Sep '13

Hi everyone, Tried Wikimetric today and it looks like a good start to me. Some feedback: * Google/Twitter account, should be something WMF like the labs/Gerrit LDAP * Should use https by default * O wait, invalid certificate, filed bug at https://bugzilla.wikimedia.org/show_bug.cgi?id=53892 * Only English? It should be multilingual like all our software. The people at translatewiki will be happy to translate for you * Upload csv user lists is not very convenient. Are you planning to come up with a easier/better system? * Project "en" is a bit weird. You're probably using <project>wiki_p for the database. Can you add a link to available projects? Or how to construct it? Say for example I want the German Wikivoyage. * Description seems to be missing for some fields at http://metrics.wmflabs.org/metrics/ * You could probably grab namespaces on the fly from the Mediawiki api * Can you add an option to give output per time period (month would be nice)? * Can you add bytes uploaded as a metric? * Can you split out the result per namespace? * http://metrics.wmflabs.org/support contains a to the empty page http://www.mediawiki.org/wiki/Wikimetrics/FAQ . Can you make that link https by default? * Where is the code? Can we submit new metrics? See for example http://toolserver.org/~reports/?wiki=nl.wikipedia.org for a similar service * Are you planning to offer some visual output besides csv/json? See for example https://toolserver.org/~emijrp/wlm/stats.php * I see you have sql queries. What tables are available? All (non-private) tables like on the Toolserver and Toollabs? * Do you have some metrics on the usage of wikimetrics? :-) Maarten

7 12

PyCon 2014 talk/tutorial/poster proposals due soon
by Sumana Harihareswara 06 Sep '13

06 Sep '13

http://us.pycon.org/2014/speaking/cfp/ PyCon tutorials, talks and posters will be April 9 to April 13, 2014 in Montreal. Talk and tutorial proposals are due Sept. 15; poster proposals are due November 1st. Analytics is among the suggested tutorial topics: http://us.pycon.org/2014/tutorials/suggested_topics/ and the PyCon site has a ton of resources for new speakers to help you get a proposal together. If you don't work for WMF and you get a session accepted, check out https://meta.wikimedia.org/wiki/Participation:Support to help with travel costs. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation

1 0

Fwd: Clean up of Gerrit repo's
by Diederik van Liere 05 Sep '13

05 Sep '13

Heya, I would like to delete some of our Gerrit repo's that we don't use at all and just clutter up our projects. I propose to delete the following repo's: * analytics/DeviceMapLogCapture (a shortlived experiment for device recognition by Patrick Reilly, we are using OpenDDR now) * analytics/debs/kafka-0.7.2 (this is an old test version of kafka, the new repo lives under operations) * analytics/dclass an old dclass repo with debian stuff, the new one lives under operations * analytics/graphkit - i think this is a precursor to Limn, the description says "IGNORE THIS REPO" * analytics/global-dev/sqproc - a small repo from evan, his work is all available under github.com/embr * analytics/global-dev/reportcard - message "IGNORE THIS REPO" * analytics/reportcard/old-pipeline - a test in python to replace wikistats, never got far. * analytics/user-metrics-2 - not used AFAIK * analytics/packages/thrift This repository has been abandoned in favor of https://github.com/wmf-analytics/thrift-debian * analytics/E3Analysis - the correct repo for UMAPI is analytics/user-metrics Please chime in if you disagree with any of the proposals. Once we have consensus I will ask Chad to delete them. Best, Diederik

2 2

CommanderData joins the Analytics Team
by Diederik van Liere 05 Sep '13

05 Sep '13

Heya, Today a new recruit has joined the Analytics Team: CommanderData! Not surprisingly, it's IRC handle is CommanderData and it's duties are: 1) helping to minglify the Analytics Team to unprecedented heights. 2) answer random why questions. >From now on, entering a # followed by the mingle card number like #1112 will trigger CommanderData and reply using a link to the mingle card. Best Diederik

1 0

Changes to language in gender preferences
by Dario Taraborelli 04 Sep '13

04 Sep '13

Anyone analyzing gender data in Wikimedia projects via the public 'gender' setting stored in user_properties? This is to give you the heads up that the language in the preference UI recently changed: https://gerrit.wikimedia.org/r/#/c/30756/7/languages/messages/MessagesEn.php See this bugzilla ticket for more context: https://bugzilla.wikimedia.org/show_bug.cgi?id=31816 Dario -- Dario Taraborelli Wikimedia Foundation http://wikimediafoundation.org http://nitens.org/taraborelli

1 0

Logging Infrastructure ToDos
by Andrew Otto 04 Sep '13

04 Sep '13

I just spent some time playing with Hive and JSON today, and I think I finally have a grasp on all of the items and questions that are left to make this actually happen. I'm writing them down here to summarize for you and for my own brain :) - varnishkafka -- compression support (snappy?) -- puppet module -- local puppetization (with our JSON logging format nailed down). -- Packaged and installed on mobile hosts via puppet. - Kafka 0.8 Brokers -- 0.8 package in apt.wikimedia.org (Alex K is going to do this for me soon). -- Repave analytics1021 and analytics1022, install Kafka brokers via puppet. - Camus/ETL -- Figure out how to deploy and run this: Shadow Jar? Puppetized cronjob? Oozie? -- If needed, implement geocoding and anonymization as part of Camus ETL phase. This could also be done as an after the fact Pig or MR job scheduled by oozie. -- Do Hadoop compression settings automatically work when writing to HDFS from Camus? - Hive -- How do we properly deploy and use hive-serdes-1.0-SNAPSHOT.jar? -- Determine proper webrequest Hive schema based on final varnishkafka JSON log format. Put this in Kraken repo somewhere? -- Write oozie job for creating Hive partitions after Camus imports. :)

1 0

Limn: move away from Coco?
by Diederik van Liere 01 Sep '13

01 Sep '13

Heya, I have been talking with a lot of you in the past months and at Wikimania about Limn and how to move forward. One of the recurring themes has been that currently Limn is written in Coca and that significantly hinders adoption as there are very few Coco developers (Coco is a fork of Coffeescript). I have sent this email to mobile-tech, e2 and e3 mailinglists as well because there are many developers outside of the Analytics team who use Limn and I would really like to hear their opinion as well. So the question I want to pose is: "Should we recompile Limn to either Coffeescript or Javascript or keep using Coco?" This question is getting more urgent because of two reasons: 1) The Analytics team is going to grow in the coming months and we expect to start developing features for Limn again and if we want to drop Coco as dependency then this is probably the best time to talk about it. 2) It seems that the community around Coco is stagnant maybe even on the decline. When visiting https://github.com/satyr/coco you can see that there are very few commits in the last 4 months. This could either mean that the language is feature complete and bug free or more likely that the decline has started. For the long-term prospects of Limn, this is not good news. I would like to run a strawpoll and please respond to this thread by answering with either Javascript, Coffeescript or Coco and optionally a short explanation. Thanks! D

7 8

Retrieving edit conflict logs
by Adam Wight 01 Sep '13

01 Sep '13

Dear comrades, I'm hoping to provide a data stream and archival data for edit conflict events on *.wikipedias. The short-term goal is to help support further research into heuristic reconstruction of the article revision graph, see this paper presented by Jianmin Wu (author CC'ed here): http://opensym.org/wsos2013/proceedings/p0204-wu.pdf The only marker I have found so far is, unfortunately, a message emitted using wfDebug. Do we have an archive of production debug logs, and what is the process I would follow for proposing a historical experiment or an ongoing filter using this data? For anyone who's curious, I think the main string I'm looking for is "Keeping edit conflict, failed merge.", but it would be worthwhile to analyze logging from every code path within conflict resolution.

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics September 2013