Analytics December 2013

analytics@lists.wikimedia.org

29 participants
20 discussions

Analytics for tools hosted on labs?
by Jessie Wild 13 Dec '13

13 Dec '13

Hey everyone - Does anyone know if there is a way of measuring how much usage happens to a tool hosted on labs? So for example - if I were to host a tool I developed (ha), could I see how many people were accessing it, similar to what I would find via Google Analytics? Thanks! Jessie -- *Jessie WildGrantmaking Learning & Evaluation * *Wikimedia Foundation* Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! Donate to Wikimedia <https://donate.wikimedia.org/>

9 14

What's going on with the Echo stats?
by Ryan Kaldari 09 Dec '13

09 Dec '13

Anyone know what's going on with the stats here: http://ee-dashboard.wmflabs.org/graphs/enwiki_echo_category Specifically, it looks like most of the notification types died in November and link notifications had a huge spike on my birthday. Sorry if this has already been discussed. I just noticed it. Ryan Kaldari

4 4

Wikipedia corpora from Google
by Dario Taraborelli 07 Dec '13

07 Dec '13

Google has released over time a huge amount of open data from or about Wikipedia. Check them out: http://googleresearch.blogspot.com/2013/12/free-language-lessons-for-comput… Some highlights: 50,000 Lessons on How to Read: a Relation Extraction Corpus What is it: A human-judged dataset of two relations involving public figures on Wikipedia: about 10,000 examples of “place of birth” and 40,000 examples of “attended or graduated from an institution.” 40 Million Entities in Context What is it: A disambiguation set consisting of pointers to 10 million web pages with 40 million entities that have links to Wikipedia. This is another entity resolution corpus, since the links can be used to disambiguate the mentions, but unlike the ClueWeb example above, the links are inserted by the web page authors and can therefore be considered human annotation. Distributing the Edit History of Wikipedia Infoboxes What is it: The edit history of 1.8 million infoboxes in Wikipedia pages in one handy resource. Attributes on Wikipedia change over time, and some of them change more than others. Understanding attribute change is important for extracting accurate and useful information from Wikipedia. Dictionaries for linking Text, Entities, and Ideas What is it: We created a large database of pairs of 175 million strings associated with 7.5 million concepts, annotated with counts, which were mined from Wikipedia. The concepts in this case are Wikipedia articles, and the strings are anchor text spans that link to the concepts in question. Dario (ht Nicolas Torzec)

1 0

Re: [Analytics] Analysis of pageviews trends after recent correction for overreporting
by Erik Zachte 07 Dec '13

07 Dec '13

Tinyurl link is broken, sorry, here is the full link instead https://docs.google.com/a/wikimedia.org/document/d/1kpJrfataS5KAxGXFoygQVhMl zFftjsvX9HktSAAKfrQ/edit Also, notice to wikimedia-l http://lists.wikimedia.org/pipermail/wikimedia-l/2013-December/129068.html From: Erik Zachte [mailto:ezachte@wikimedia.org] Sent: Saturday, December 07, 2013 7:07 To: Erik Moeller Cc: Toby Negrin; Christian Aistleitner; Dario Taraborelli; Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Analysis of pageviews trends after recent correction for overreporting Hi Erik, Here is an analysis of recent pageview stats after they were corrected for overreporting, as requested. http://tinyurl.com/pmm66v4 I will also post to wikimedia-l to inform a wider audience this bug has been closed. Cheers, Erik

1 0

Analysis of pageviews trends after recent correction for overreporting
by Erik Zachte 07 Dec '13

07 Dec '13

Hi Erik, Here is an analysis of recent pageview stats after they were corrected for overreporting, as requested. http://tinyurl.com/pmm66v4 I will also post to wikimedia-l to inform a wider audience this bug has been closed. https://bugzilla.wikimedia.org/show_bug.cgi?id=57980 Cheers, Erik

1 0

Re: [Analytics] State of mobile limn dashboard
by Arthur Richards 06 Dec '13

06 Dec '13

+analytics On Wed, Nov 13, 2013 at 5:09 PM, Jon Robson <jrobson(a)wikimedia.org> wrote: > Thanks so much Juliusz for exploring this and great work fixing the > schema (apologies for me not predicting that might be an issue) and > sorry for all the pain this must have caused you. > > We can't be the only teams using Limn in the Foundation. It might be > worth pulling everyone together. Am I right in thinking that Limn is a > child of the analytics team? Maybe we should at least spend some with > them getting our use case resolved.. I guess this is why we have an > analytics department? I can raise this issue in the next Scrum of > Scrums if it is not resolved by then. > > On Wed, Nov 13, 2013 at 3:54 PM, Juliusz Gonera <jgonera(a)wikimedia.org> > wrote: > > For the past few days (or more) graphs at > > http://mobile-reportcard.wmflabs.org/ stopped updating. The dashboard > > consists of two parts: Limn, which displays the data, and backend scripts > > that generate the graph data based on Event Logging data. The issue was > > caused by two independent problems in the second component: > > > > 1. A change of MobileWebEditing schema was incorrectly addressed in the > > scripts' config and caused the script to throw an exception. > > 2. Backend scripts are stupid and not optimized at all. > > > > The first thing is fixed. To work around the second thing I had to > disable > > updates of "Editors registered on mobile who made 5+ edits on enwiki > > (mobile+desktop)" graph [1] for now (the query was timing out and > causing an > > exception too) and removed the performance graph, since we'll be using > > ganglia (and soon graphite) for that [2]. Graphs should get updated soon. > > > > So why are those backend scripts stupid? Because they run every hour and > > recalculate _all_ the values for every single graph. For example, even > > though total unique editors for June 2013 will never change, they are > still > > recalculated every hour. This was a quick and easy solution for > generating > > graphs, but as Event Logging tables keep growing, we add more graphs and > > those graphs show more and more data, it's no longer performing. > > > > I discussed this briefly with Ori and I think we agree on the general > > direction. We should definitely schedule some time for working on this. > We > > could start with a spike investigating if there is a framework for > > aggregating the sums that we could use and asking what other teams in the > > foundation use for generating their graph data. The results of this spike > > and possible following work could be useful not only for the mobile team. > > > > [1] https://gerrit.wikimedia.org/r/#/c/95298/ > > [2] > > > http://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&tab=v&vn=Mobile+Web&hi… > > > > -- > > Juliusz > -- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

10 51

Metrics repo github -> gerrit
by Christian Aistleitner 04 Dec '13

04 Dec '13

Hi, just a heads up that our metrics repository from github https://github.com/wikimedia/metrics has been brought over to gerrit https://gerrit.wikimedia.org/r/#/admin/projects/analytics/metrics . So we can now do proper code review on the repo. If you want to try it out, some commits are waiting for review in gerrit already: https://gerrit.wikimedia.org/r/#/projects/analytics/metrics,dashboards/defa… Have fun, Christian -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

1 0

Webstatscollector docs
by Andrew Otto 03 Dec '13

03 Dec '13

Just started this page, for use if anyone else has to upgrade webstatscollector in the future: https://wikitech.wikimedia.org/wiki/Analytics/Webstatscollector

1 0

Fwd: Stand up today
by Dan Andreescu 03 Dec '13

03 Dec '13

forwarding to the proper list, analytics-internal(a)wikimedia.org should be deleted from anyone's address book. ---------- Forwarded message ---------- From: Andrew Otto <aotto(a)wikimedia.org> Date: Mon, Dec 2, 2013 at 8:42 AM Subject: Stand up today To: Analytics Team - Internal <analytics-internal(a)wikimedia.org> Ah! The VA DMV is inconveniencing me! :) My newused car needs to be registered. I mailed in the forms 3 weeks ago hoping to have it all settled by now. It isn't, so I have to go to the DMV to figure out why. They haven't been open due to holidays, and I thought they opened at 8am, but they don't open til 9am. And my mobile hotspot seems to be broken! :( I will most likely miss stand up today while I figure this out. Also, this means I'm not back in Brooklyn yet as I had intended. I accidentally took last Friday as a vacation day, but didn't need to. So I'm not sure what's happening yet, but I will probably need to drive back up today or tomorrow. I worked a few hours this weekend, and if I need to drive during work hours ill just use up that vacation day I took. Anyway, update! - varnishkafka deb approved, however faidon wants me to merge in one last change from Magnus to support some logging change for W0. - varnishkafka mobile puppetization is approved too. - the above two together means I am no longer blocked on others for varnishkafka deployment. I'd like to deploy to a mobile host or two this week. - logster .deb has been approved. This means I can puppetize varnishkafka ganglia stats and subsequently icings alerts. Dan, I need a little help w python classpath things, something isn't making sense. - python-Kafka .deb approved. Ori wanted thus for event logging-Kafka support. I need to put this .deb into apt for him. - lots of discussion about how to deal with cross dc latency. Magnus has convinced faidon to let varnishkafka buffer up to 10G on disk if necessary. Apparently this will help during short periods of packet loss and high late latencies. He's still yet to code this though. -I'm slated to help Nik turn elastic search back on today. Happy post tg! Talk to y'all's later.

1 0

Limn instances need a kick
by Erik Moeller 02 Dec '13

02 Dec '13

All seem down, perhaps due to recent Labs outage? -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation

4 7

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics December 2013