Analytics February 2015

analytics@lists.wikimedia.org

45 participants
44 discussions

Odd data in dumps
by Matthew Ruttley 01 Mar '15

01 Mar '15

Hi Analytics, I've been digging through some of the wiki page count files and found some strange results. In several files, the Main_page visit count is vastly lower than expected: mruttley$ cat pagecounts-20141101-170000 | grep "^en Main_page" en Main_page 260 6202982 mruttley$ cat pagecounts-20150201-170000 | grep "^en Main_page" en Main_page 200 4802139 Only 260 and 200 page views! What do you reckon? Am I doing it wrong? Best regards, Matthew

4 4

[Data][Outage] Statistics per wikipedia for 2015
by Kevin Leduc 28 Feb '15

28 Feb '15

Hi Erik Z, A member from the Czech community requested an update on when stats for 2015 per wiki [0] will be available. The email was sent to the wikimetrics list [1] so I am relaying it here. [0] http://stats.wikimedia.org/CS/TablesWikipediaCS.htm [1] https://lists.wikimedia.org/pipermail/wikimetrics/2015-February/000258.html

3 2

Re: [Analytics] [Wikimetrics] Monthly metrics
by Jonathan Morgan 27 Feb '15

27 Feb '15

+ Analytics list ---------- Forwarded message ---------- From: Vojtěch Dostál <vojtech.dostal(a)wikimedia.cz> Date: Fri, Feb 27, 2015 at 2:19 AM Subject: [Wikimetrics] Monthly metrics To: wikimetrics(a)lists.wikimedia.org Hi, does anyone please know when the Monthly metrics for Wikimedia projects will get updated? http://stats.wikimedia.org/CS/TablesWikipediaCS.htm I know it says that the process is delayed but it would actually be nice if someone would look into the problem - I am using these metrics to analyze our impact. thanks Vojtěch Dostál místopředseda / vice-chairman Wikimedia Česká republika / Wikimedia Czech Republic http://www.wikimedia.cz Facebook <https://www.facebook.com/Wikimedia.CR> | Twitter <https://twitter.com/Wikimedia_CR> | Newsletter <http://eepurl.com/FsHJr> _______________________________________________ Wikimetrics mailing list Wikimetrics(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimetrics -- Jonathan T. Morgan Community Research Lead Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> jmorgan(a)wikimedia.org

2 1

Cluster issues. Refining suspended. Hence a few datasets start to lag.
by Christian Aistleitner 27 Feb '15

27 Feb '15

Hi, just a quick heads up that the Analytics cluster got stuck today. And jobs deadlocked themselves waiting for other jobs to free resources. For the time being, to allow the cluster to catch up for the missed hours, I suspended the refining jobs. This gives the cluster enough resources to catch up with importing the kafka data that it missed during the day. But this also means that the datasets: pagecounts-all-sites, pagecounts-raw, legacy_tsvs will fall behind a bit, and the wmf.webrequest data will not see new data while the cluster is catching up. Tomorrow, in the European morning when the cluster has caught up, I'll enable refining again, and the datasets should catch up again. Sorry for the inconveniences, Christian P.S.: Suspending refining looks a bit drastic. But if we only killed the resource hungry jobs without stopping refining, refining would start during the catch up of camus and produce faulty datasets. Hence, we suspended refining for now. Tomorrow, we'll resume the suspended jobs and have the datasets catch up again. P.P.S.: If you have resource hungry jobs on the Analytics cluster, if possible please wait until tomorrow to run them. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian(a)quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

3 3

[Technical] hand-coding logs for the new pageviews definition
by Oliver Keyes 27 Feb '15

27 Feb '15

Hey all, As part of the quality assurance work on the new pageviews definition I've hand-coded 20,000 rows from the webrequests table that the new definition identifies as pageviews - 10,000 from mobile, and 10,000 from desktop, spread out and pseudo-randomly sampled over multiple days and hours. TL;DR it looks really promising, but we need some expansion of how we're storing pageIDs, and to filter out edit attempts on the desktop side. And we still have a lot of handcoding to do. On Mobile, the definition is doing exactly what we expect it to do, and including precisely the classes of pages we want. There is, seriously, a 100% success rate there. The only limiting factor is around turning "pageviews" (views of our HTML content) into the sort of pageviews that can be aggregated on a per-page basis - in other words, grabbing the pageID and namespace. A recent patch to MediaWiki by Ori, Otto and others means that the pageID and namespace are now automatically passed through to the varnish, which makes this a LOT easier. Buuuut...they're not being passed through for app requests, which is a big blind spot if we assume apps behave differently. They're also not being passed through for, e.g., index.php?action=render style requests. On Desktop, the definition is doing /almost/ what we want it to do. The big problem is that due to a change in the MIME type edit requests report with, it's including edit attempts: whoops. We should be able to filter this with a trivial regex change...I think. I'd need a better idea of whether URL parameters tend to be localised. So, promising, needs edits filtered! The next step is a further round of hand-coding, this time targeted at requests the new definition /excludes/. -- Oliver Keyes Research Analyst Wikimedia Foundation

1 0

[Release]
by Oliver Keyes 26 Feb '15

26 Feb '15

Hey all! We've released a highly-aggregated dataset of readership data - specifically, data about where, geographically, traffic to each of our projects (and all of our projects) comes from. The data can be found at http://dx.doi.org/10.6084/m9.figshare.1317408 - additionally, I've put together an exploration tool for it at https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ Hope it's useful to people! -- Oliver Keyes Research Analyst Wikimedia Foundation

7 12

Re: [Analytics] [Technical] eventlogging master
by Nuria Ruiz 26 Feb '15

26 Feb '15

CC-ing Ori. He mentioned he was given a box today but no further details. Thanks, Nuria On Wed, Feb 25, 2015 at 5:25 PM, Sean Pringle <springle(a)wikimedia.org> wrote: > On Sun, Feb 22, 2015 at 1:20 PM, Nuria Ruiz <nuria(a)wikimedia.org> wrote: > > Coordination on Monday sounds good. > > Did you guys come to any conclusion about vanadium? > > _______________________________________________ > Analytics mailing list > Analytics(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >

2 1

Application to work in field of Research and Data Analytics at Wikimedia Engineering
by Vikramank Singh 26 Feb '15

26 Feb '15

Respected Sir / Ma'am, I, the undersigned, is a IIIrd year undergraduate pursuing my Bachelors in Computer Science from India and is a Data Science enthusiast. I share deep interest in the field of Data Science and Analytics and have done various research projects in the field of Data Mining, Data Visualization, Social Media Analytics at institutes like IIM Ahmedabad, IISc Bangalore etc. As wikimedia is the world's largest resource of education data various analytics can be performed on the data to make various services easy and more predictable. I want to work on any project that wikimedia Research and Analytics team is working upon and assist them to develop something great. I am willing to work full time as an intern from home without any stipend demands. I am attaching my resume with some of my research reports which would give you a brief idea of my previous work and knowledge in this field. I request you to please have a look at my resume, i am just a data geek who want to work with some of the best professional and develop something really useful. I ensure you, given an opportunity, i wont let you down. Thanks a lot. Waiting for your reply eagerly. Regards, Vikramank Singh, Co-Founder at www.notemybook.in +91-9768 251 481

4 3

Confluent, whoa
by Andrew Otto 25 Feb '15

25 Feb '15

Whoa, Confluent (Kafka folks) just packaged up everything we've been building over the last two years: http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/ <http://blog.confluent.io/2015/02/25/announcing-the-confluent-platform-1-0/> http://confluent.io/docs/current/platform.html <http://confluent.io/docs/current/platform.html> http://blog.confluent.io/2015/02/25/stream-data-platform-1/ <http://blog.confluent.io/2015/02/25/stream-data-platform-1/>

2 2

Fwd: Reasons you use the XML dumps or want to, but can't?
by Federico Leva (Nemo) 25 Feb '15

25 Feb '15

FYI -------- Messaggio inoltrato -------- Oggetto: [Xmldatadumps-l] Your comments needed (long term dumps rewrite?) Data: Thu, 19 Feb 2015 12:30:01 +0200 Mittente: Ariel Glenn WMF <ariel(a)wikimedia.org> A: Xmldatadumps-l(a)lists.wikimedia.org The MediaWiki Core team has opened a discussion about getting more involved in and maybe redoing the dumps infrastructure. A good starting point is to understand how folks use the dumps already or want to use them but can't, and some questions about that are listed here: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Backlog/Improv… I've added some notes but please go weigh in. Don't be shy about what you do/what you need, this is the time to get it all on the table. Ariel

5 4

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics February 2015