I need to upgrade the kernel on stat1 to get an important recently
released security update. This means I need to restart it.
Erik Zachte and Aaron Halfakar, you two are the only ones I see logged
in and running processes. Can I reboot stat1 now, or do you need to do
anything (stop your processes, etc.) first?
The upgrade should be painless, and stat1 should be down for less than 5
To anyone who might know the answer:
I'm looking for any measure of site traffic over the years, all in one
place where it's apples to apples. I'd like to adjust income by site
traffic for the past several years' fundraisers. Any measure will do --
unique visitors, page views per year, month... Going back to 2007 would be
Sorry to ask this random question, but a whole bunch of googling and
searching our own stats pages hasn't gotten me there.
The first complete draft of the UserMetrics API end-user documentation is now available on mediawiki.org:
Kudos to Kirsten on putting this together. If you are interested in using/testing/contributing to UserMetrics, please drop us a line at: usermetrics(a)wikimedia.org
The motion chart reminds you of Has Rosling's Gapminder because it uses
motion chart code from Google Chart Tools which is built on the
Gapminder software :)
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
> Send Analytics mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
> You can reach the person managing the list at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
> Date: Tue, 14 May 2013 16:19:52 +0200
> From: Sophie Österberg <sophie.osterberg(a)wikimedia.se>
> To: "A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics."
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Content-Type: text/plain; charset="iso-8859-1"
> The motion chart reminds me a bit of gapminder;
> Nice work!
> *Be Bold!
> Sophie Österberg
> *Every single contribution to Wikipedia is a gift of free knowledge to
> humanity. *
My name is Sajjad. Me and my colleague Sumandro are working on a
project with The Centre for Internet and Society
(http://cis-india.org/), India, to visualise the activities on Indic
Wikipedia projects. We have already developed a few basic
visualisations. You can see our work here:
We are writing to you to ask for some help for our further work plans.
The next round of visualisation involves mainly two aspects:
1. Edits by geography - We wanted to see from where people are
contributing to the Indic Wikipedia projects around the world. A city
level data will be perfect for us to visualize this. Hence we want
total edits for a Indic Wikipedia project from a particular location
for a given period (preferably, per year)
2. Most edited articles and edit wars - We would like to explore the
most edited articles per project and see if it is possible to
visualise edit wars. For this we are looking for total edit counts
(number of instances and volume of edit) for each article (given a
threshold) for each Indic Wikipedia project per year.
Jessie Wild suggested that we write to this list for pointers in terms
of the data.
Please let us know if any of you have worked with or if you can help
us get hold of the relevant data.
Thank you so much!
Sajjad Anwar | http://sajjad.in | @geohacker
Wednesday's sprint demo was skipped as a lot of the Analytics Team members
were on holidays and most of the work that was completed was hard to
showcase. However, we did wrap up the fifth sprint of the "Self-Serve
Observational Analytics" Release. The goal of this release is to schedule
features that will empower end-users to interact independently with the
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-05-08 ##
#388 F - Admin defines new static cohort by uploading CSV (5) Done
requested by E3 (Dario)
#570 I - Local development environment for engineers to debug User Metrics
(8) Done requested by E3 (Dario)
#581 I - Continous Integration wikistats (N/E) Done requested by Analytics
#613 I - Jenkins CI for udp-filter fails (N/E) Done requested by Analytics
## Planned for Showcase on 2013-05-15 ##
#622 I - Setup multicast relay to gadolinium (N/E)
#538 F - Performance improvement udp-filter (N/E)
## Current Sprint (ending 2013-05-15) ##
Stories in progress from last sprint:
#148 I - Network ACL (N/E) BLOCKED requested by Ops/Mark
#131 I - Puppetize Kafka 0.7 (8) Coding requested by Analytics & Ops
#134 I - Puppetize Hadoop Coding requested by Analytics & Ops
#645 F - X-CS header measurements (3) Coding requested by Wikipedia Zero
#341 D - Traffic reports: fix region Oceania (1) Testing requested by
#356 D - Squid log based traffic report SquidReportDevices.htm for mobile
devices is broken (N/E) Testing requested by Community
#538 F - Performance improvement udp-filter (N/E) Testing requested by
#385 I - Migration of stat1 (pmtpa) to stat1002 (eqiad) (3) requested by
Analytics and Ops
#545 F - Librdkafka supports Kafka 0.8 (13) requested by Analytics and Ops
#503 F - Page View Metrics report for non Wikipedia Mobile Apps (5)
requested by Mobile (Tomasz)
#673 I - Make current UMAPI codebase work in prod (1) requested by E3
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
We are receiving reports  that pageview numbers for a small subset of articles are significantly lower then they used to be. See
(those links are for enwiki articles)
What these articles have in common is that Google has indexed them using the https protocol. This in combination with us no longer sending the Nginx SSL traffic to udp2log (this happend IIRC in the week of March 25 - March 31, 2013) explains a part of the drop but not entirely.
Webstatsollector, the program that generates the data that is shown on stats.grok.se did not deduplicate counts for https and so we did expect a 50% drop. Thus, prior to disabling sending SSL traffic to udp2log we were overcounting. However, the drop is larger than 50% which means something else is going on as well.
For April 29th, 2013 for the 'http(s)://en.wikipedia.org/wiki/Cancer' article the following counts were calculated (using zcat sampled-1000.tsv.log-20130429.gz | cut -f 12 | grep "http://en.wikipedia.org/wiki/Cancer$" | wc -l) and changing the field 9 or 12 for url or referer and changing http/https
| | direct requests | referer hits |
| | (field 9) | (field 12) |
| http hits | 5 (5000) | 35 (35000) |
| https hits | 0 (0) | 65 (65000) |
(The first number is the actual observed number, the numbers in parentheses are the absolute numbers after multiplying by 1000 as that is the samping factor)
There are many https hits for the cancer article in the referer but none in the URL field, which could be an indication that the squids are not correctly logging Nginx SSL redirected requests. The reason we see so few http hits for the cancer article is obviously because Google sends people to the https version. Finally, we do see a lot of https hits in the referer, this is mostly to the upload domain and suggests that actually many people are reading this article.
There are at least two different solutions to solve this problem:
1) Stop Google to index https articles by adding a <link rel="canonical" href="http://*.wikipedia.org/wiki/Foo" /> to every page. I belief this could be done in Mediawiki. The problem is similar to Google indexing the articles on the .m. domains and we resolved that as well.
2) Make sure that https hits are properly logged by Squid (assuming that is the problem).
I am sure there are other possible solutions, including setting the X-Proto-For header so please chime in if you disagree with the diagnosis or have an alternative solution.
OKCon is the annual conference for Open Knowledge (Foundation),
17th-18th September 2013, Geneva, Switzerland. It was called "OKFest"
last year. It's a well-attended and well-organized conference for anyone
interested in open knowledge, sharing, open hacking, etc.
Opportunities for Wikimedia lighting talks, workshops, etc.:
- Wikipedia Zero (see Open Development & Sustainability track)
- Analytics and open data (see Technology, Tools & Business)
(UserMetrics API? privacy? Limn?)
- SOPA/PIPA and related activities (see Evidence & Stories)
- Hack events: use their hackspace. Teach folks to make bots,
gadgets, apps, and Lua templates. Get user testing from other
open culture advocates and learn what tools they need.
This conference is eligible for subsidy of travel costs -- see
https://meta.wikimedia.org/wiki/Participation:Support to put in your
Thanks to Sarah Stierch for the heads-up.
Engineering Community Manager