Analytics August 2013

analytics@lists.wikimedia.org

30 participants
33 discussions

New Gerrit and IRC metrics in the Community Dashboard
by Alvaro del Castillo 29 Aug '13

29 Aug '13

Hi guys, During this week the Gerrit and IRC panels have been added to the Community Dashboard. IRC: http://korma.wmflabs.org/browser/irc.html Right now we are just analyzing two channels. What IRC channels should we include here? Gerrit: http://korma.wmflabs.org/browser/scr.html We are following the same approach than git, analyzing the projects coming from gerrit.wikimedia.org Whit this two data sources the dashboard includes now all the sources defined for this phase of the project. It is time to polish and reorder the dashboard to make it really useful. Cheers! -- |\_____/| Alvaro del Castillo [o] [o] acs(a)bitergia.com - CTO, Software Engineer | V | http://www.bitergia.com | | -ooo-ooo-

3 4

Fwd: Dashboards/graphs/dashsources/... at gp.wmflabs.org
by Jonathan Morgan 28 Aug '13

28 Aug '13

Forwarding because I assume not all the G&P stakeholders are on the Analytics list. G&P folks: which of these visualizations are we currently using? I imagine that some of the ones set up by Evan a while ago might not be in use currently, but I'm sure others are regularly referred to. Also, a question for Christian: I believe a lot of the current visualizations are running off datasources in various parts of Evan's home directory. How easy would it be to generate a list of the currently running visualizations, and where they're pulling data from? This might help us figure out which ones we still need, and move them to a better home if necessary. Thanks, Jonathan ---------- Forwarded message ---------- From: Christian Aistleitner <christian(a)quelltextlich.at> Date: Tue, Aug 27, 2013 at 7:20 AM Subject: [Analytics] Dashboards/graphs/dashsources/... at gp.wmflabs.org To: analytics(a)lists.wikimedia.org Hi, we are currently serving a few hundred graphs, dashboards, ... at http://gp.wmflabs.org/ , but running the various scripts that generate them is a bit shaky and their maintenance is eating up a considerable amount of time. So in order to better use resources, and limit maintenance work, we're curious about which parts, URLs, dashboards, graphs, datasources of the site are actually in use by people in one way or the other. If you rely on parts, URLs, dashboards, graphs, datasources of http://gp.wmflabs.org/ please let us know by August 30. Best regards, Christian P.S.: We may think about removing unused parts or stop even trying to update them. So if you are using some parts, please do let us know :-) P.P.S.: We already reached out to the users that we know of. So do not feel pressed to reply again, if you have already replied to the private email about this issue. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ --------------------------------------------------------------- _______________________________________________ Analytics mailing list Analytics(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Jonathan T. Morgan Learning Strategist Wikimedia Foundation -- Jonathan T. Morgan Learning Strategist Wikimedia Foundation

4 5

Fwd: [nupic-dev] Fall 2013 NuPIC Hackathon
by Diederik van Liere 27 Aug '13

27 Aug '13

Heya, NuPIC seems to be quite a promising library to build machine learning applications and they will be hosting a hackathon in November in SF. Thought some people might be interested about this! D ---------- Forwarded message ---------- From: Matthew Taylor <matt(a)numenta.org> Date: Tue, Aug 27, 2013 at 12:45 PM Subject: [nupic-dev] Fall 2013 NuPIC Hackathon To: "NuPIC general mailing list." <nupic(a)lists.numenta.org> Hackers, start your engines! http://numenta.org/events.html#november_2013_hackathon Nov 2-3, at the Tagged offices in San Francisco. RSVP here: http://www.meetup.com/numenta/events/136809782/ I'll be adding more details about the NLP focus as we set them up. --------- Matt Taylor OS Community Flag-Bearer Numenta _______________________________________________ nupic mailing list nupic(a)lists.numenta.org http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

1 0

Wikimedia Tech KPIs
by Quim Gil 27 Aug '13

27 Aug '13

http://www.mediawiki.org/wiki/Community_metrics#Key_performance_indicators After the discussion at wikitech-l, these are the Key Progress Indicators that we will try to extract from the tech community metrics at http://korma.wmflabs.org It will take a while and we are still polishing the basics of the dashboard, but it is good to have these goals. -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil

2 4

FYI: Inconsistent cache log lines
by Matthew Walker 26 Aug '13

26 Aug '13

Whist investigating an orthogonal logging issue I encountered a couple of differences between squid/varnish which I didn't know about: * Varnish does not give subsecond request time information * Varnish does give subsecond request processing time info * Varnish calls it 'hit/200' or 'miss/302' instead of 'TCP_MEM_HIT/200' or 'TCP_MISS/302' * Varnish does not URL encode the user agent field Example log lines: amssq41.esams.wikimedia.org 1013692039 2013-07-31T23:00:02.331 0 XXX TCP_MEM_HIT/200 614 GET http://meta.wikimedia.org/XXX NONE/- image/png http://en.wikipedia.org/XXX - Mozilla/5.0%20(Wind... en-US en;q=0.8 - cp1006.eqiad.wmnet 1442176851 2013-07-31T23:00:02.452 0 XXX TCP_MISS/302 406 GET http://meta.wikimedia.org/XXX NONE/- - http://en.m.wikipedia.org/XXX - Mozilla/5.0%20(i.... en-us - cp3012.esams.wikimedia.org 823553992 2013-07-31T23:00:02 0.000119448 XXX hit/200 20 GET http://meta.m.wikimedia.org/XXX - image/png http://de.m.wikipedia.org/XXX XXX Mozilla/5.0 (iPho... de-de - ~Matt Walker Wikimedia Foundation Fundraising Technology Team

6 9

Fwd: 50 JavaScript Libraries for Charts and Graphs | TechSlides
by Rob Lanphier 26 Aug '13

26 Aug '13

Hi folks, Stumbled into this list of graphics libraries: http://techslides.com/50-javascript-charting-and-graphics-libraries/ Many of these are familiar, but it looks like one of the most comprehensive list of visualization I've seen around ....sadly, no Limn :-( Rob

2 1

Tech community metrics in Bugzilla
by Quim Gil 24 Aug '13

24 Aug '13

Hi, we would like to handle bugs and feature requests for our tech community metrics in Bugzilla. The logical place (suggested by Andre) would be a component under https://bugzilla.wikimedia.org/describecomponents.cgi?product=Analytics Any objection? Component: Korma (could also be "Tech metrics" or "Tech community") Description: Tech community metrics tools and data used at http://korma.wmflabs.org -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil

5 7

LinkedIn's Samza
by Andrew Otto 23 Aug '13

23 Aug '13

Check it! http://samza.incubator.apache.org/ They're Hello Samza tutorial uses the Wikipedia edit stream :) http://samza.incubator.apache.org/startup/hello-samza/0.7.0/ -Ao

2 1

Finding old data to blend in for mobile device properties and platform computations
by Christian Aistleitner 22 Aug '13

22 Aug '13

Hi, we are currently bringing the device property, and platform computations back to life outside of Hadoop. Data for the last few days has been computed and the jobs are running. However, I am not sure about the old data that we have. Should we blend that in? * For device properties, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props seem to contain property data for 2013-03-01 until 2013-05-15. Since this data stopped already in mid-May, I assume we have more data to blend in (end of May, June, July) at a different place. Do we have such data? Do we know if the above data is good or it's just a relict from test runs? * For platform data, I found that http://stats.wikimedia.org/kraken-public/webrequest/mobile/platform/mobile_… has platform data from 2013-04-14 until 2013-07-20 in However, I am not sure which of this data is valid. Naive, uneducated plausibility checks fail badly [1]. Do we know if/which data is good? Do we have a better or other sources for the platform job? Best regards, Christian [1] For example when only looking at the last few data points for Android for example Tuesdays we get [2]: 2013-04-16: 6438000 2013-04-23: 6300000 2013-04-30: 6559000 2013-05-06: 7267000 2013-05-13: 6954000 2013-05-27: 33335000 2013-06-04: 14388000 2013-06-11: 8563000 2013-06-18: 10241000 2013-06-25: 6896000 2013-07-09: 3454000 2013-07-16: 7206000 The highest value (33M) is 10 times as high as the lowest (3M)—within only three months. Even when considering those data points outliers (and we have readings that are even further out. Ranging from 1M–37M for Android), the lowest data point is half the highest data point. All on the same weekday! This looks suspicious. [2] There is no data for 2013-05-20, and 2013-07-02. -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

2 2

Re: [Analytics] [wmfresearch] stat1 -> stat1002 migration of private data
by Andrew Otto 21 Aug '13

21 Aug '13

Hello again! Ok, we're actually going to do this this time. As far as we know, people who need access to private webrequest data have migrated their stuff over to stat1002.eqiad.wmnet. The private webrequest data that currently exists on stat1 will soon be deleted. Soon is August 7th. That's in 1 week. We announced this back in May, so there should have been plenty of notice. If you are still using the webrequest logs in /a/squid/archive on stat1, find me on IRC (ottomata) or email me and we can work together to make sure you can continue to do your work on stat1002. On Wednesday August 7th, we will be removing private webrequest logs from stat1. Thanks all! -Andrew Otto On May 20, 2013, at 2:13 PM, Andrew Otto <otto(a)wikimedia.org> wrote: > >> "Before that happens, you should make sure that any personal stuff on stat1 that you need for number crunching is copied over to stat1002. " > > from your note it looks like this is only related to webrequest data, is that correct? > > Yup! That is correct. stat1002 will be primarily used as a sensitive private data host. Only those users that have personal unpuppetized code and cronjobs that use this data need to worry about moving them from stat1 to stat1002. > > > > > what are the criteria for deciding who has access to stat1002? I see that contractors like Aaron Halfaker or Jonathan Morgan currently don't have access to it. > > The criteria will be the same as before: RT request + manager approval. However, the request should only be made if the user actually needs access to the webrequest logs to do analysis. For example, if the main reason someone already has access to stat1 is so that they can access the research slave databases, then they won't need access to stat1002. > > > > > can you give us more information on the long-term plans/scope of stat1 vs stat1002 (and update https://office.wikimedia.org/wiki/Data_access as needed)? > > I've added a small bit about stat1002 on that page. > > I don't know much about a long term plan for stat1. It is hosted at the Tampa datacenter, and in the long term (yearish?) all the machines there will have be be decommissioned or relocated elsewhere. When it finally does move, it will most likely no longer have a public IP. stat1 is intended to be used as a workspace for analysts to do their thing on non-private data. > > > -Ao >

3 5

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics August 2013