As Ori put the EventLogging service into production he created an
eventlogging-alerts list to keep developers and users of EventLogging
data informed about any service or data quality issues.
The user metrics API is quickly becoming relevant to multiple teams in
the org who will likely also need to stay informed about similar kinds
of issues, whether through automatic notifications or manual emails.
Not all of them will want to be on analytics@, and not all folks on
this list will care about the nitty-gritty.
Would it make sense to setup umapi-alerts to serve the same purpose as
eventlogging-alerts, but for the user metrics API?
Cheers,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate
Jeff Heer just announced the new release, I was not aware of it, it looks particularly relevant in the context of Limn graph metadata.
github.com/trifacta/vega
Vega is a visualization grammar, a declarative format for creating and saving visualization designs. With Vega you can describe data visualizations in a JSON format, and generate interactive views using either HTML5 Canvas or SVG.
Hi!
Yesterday's sprint demo concluded the first sprint of the "Self-Serve
Observational Analytics" Release. The goal of this release is to schedule
features that will empower end-users to interact independently with the
Analytics toolset.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-04-10 ##
#60 D - Mobile pageview requests reporting in Wikistats (N/E) requested by
Mobile/Tomasz
#426 F - Authenticate users of Metrics API Admin UI (5) requested by
E3/Dario
#454 I - Enable git sub-modules in puppet merge (2) requested by
Analytics/Andrew
#460 I - Setup advanced udp2log monitoring (13) requested by Analytics
#551 F - Mobile pageviews - Documentation (N/E) requested by Analytics
## Planned for Showcase on 2013-04-17 ##
#61 F - Mobile Site Pageviews by Device Class (N/E) requested by
Mobile/Tomasz
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) requested
by Mobile/Tomsz
#240 F - Session Analysis of Mobile Site Visits by Mode (8) requested by
Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) requested by Wikipedia
Zero/Amit
## Current Sprint (ending 2013-04-17) ##
Stories in progress from last sprint:
#60 D - Mobile pageview requests reporting in Wikistats (N/E) Testing
#61 F - Mobile Site Pageviews by Device Class (N/E) requested by
Mobile/Tomasz
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) requested
by Mobile/Tomsz
#95 F - Edit metadata on chart (8) requested by Grantmaking/Jessie
#148 I - Network ACL (N/E) requested by Ops/Mark
#240 F - Session Analysis of Mobile Site Visits by Mode (8) requested by
Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) requested by Wikipedia
Zero/Amit
#497 I - Setup Kraken puppetmaster in Labs requested by Analytics
#551 F - Mobile pageviews - Documentation requested by Analytics
New stories
#131 I - Puppetize Kafka (8) requested by Analytics
#510 I - Upgrade Emery to Precise (8) requested by Ops
#430 F - Expose parameters in metrics view via Metrics API (5) requested by
E3
#559 I - python-jsonschema debianization requested by Platform
#560 I - python-voluptuous debianization requested by Platform
#561 I - python-statsd debianization requested by Platform
Queued for Dev
#94 F - Create chart through GUI (13)
#96 F - Save-As existing chart (1)
#353 D - Wikistats - mobile country report et al. (N/E)
#540 S - Look into potential implementations of job status in User Metrics
API
#541 S - Options to productionize Snuggle
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Hi yalls!
We've working on scheduling a Ubuntu Precise upgrade for 2 of the udp2log machines: emery and oxygen. emery has been around for quite a while, and has some filters that may not be used anymore. Here's a list of the ones that we'd like to disable if there are no interested parties.
# India
pipe 10 /usr/bin/udp-filter -F '\t' -c IN -b country -g >> /var/log/squid/india.tab.log
# GLAM NARA / National Archives - RT 2212
pipe 10 /usr/bin/udp-filter -F '\t' -p _NARA_ -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/glam_nara.tab.log
# Location geocoding filter for Erik Zachte.
pipe 1000 /usr/bin/udp-filter -F '\t' -g -b everything >> /var/log/squid/location-1000.tab.log
# specific country filters - 2012-01-24 through 2012-02-20 then ask Nimish or Amit if we still need them
pipe 10 /usr/bin/udp-filter -F '\t' -c CD,CF,CI,GQ -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-1.tab.log
pipe 10 /usr/bin/udp-filter -F '\t' -c KH,BW,CM,MG,ML,MU,NE,VU -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-10.tab.log
pipe 100 /usr/bin/udp-filter -F '\t' -c BD,BH,IQ,JO,KE,KW,LK,NG,QA,SN,TN,UG,ZA -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-100.tab.log
# Temporary filter to estimate view of 2012 fundraiser video
pipe 10 /usr/bin/udp-filter -F '\t' -d wikimediafoundation.org -p Thank_You_Main >> /var/log/squid/wmf.org-Thank_You_Main.tab.log
Let us know if you'd like to keep these running.
Thanks!
-AndrewO + Diederik
Hi folks!
I'm trying to get a good query to generate a list of all the users
enrolled in courses with the Education Program extension (on en.wiki,
for the immediate purposes).
Basically, I want a query for all the usernames of any user who is
enrolled in any course for a set of terms (Spring 2013, 2013 Q1, and
2013-Q1), excluding users who have any of the four 'course' userrights
(Course coordinator, Course instructor, Course online volunteer,
Course campus volunteer).
The query that I have right now (thanks to help from Oliver and Jeroen) is this:
SELECT DISTINCT ep_users_per_course.upc_user_
id FROM ep_users_per_course INNER JOIN ep_courses WHERE
ep_courses.course_term IN ('Spring 2013','2013 Q1','2013-Q1') AND
ep_users_per_course.upc_role = 0;
But that isn't adequately isolating the actual students (as opposed to
instructors and others who are enrolled in classes as students), and
returns user IDs rather than usernames.
Any help figuring out the right query would be much appreciated!
Cheers,
Sage
Hi!
Our most recent sprint was our final sprint dedicated to the "Mobile on MVC
Release"; this meant we focused on improving visibility into mobile
initiatives, including the mobile site, support for mobile applications,
and Wikipedia Zero and fixing some segfaults in the filter component of
webstatscollector.
We have some mobile feature cards that will hangover to the next sprint but
the sprint ending April 3th was the last sprint that was part of the Mobile
on MVC Release. The coming sprint is the first sprint that is part of the
new release titled "Self-Serve Observational Analytics". This means that we
will schedule more feature cards that will enable data analysts, product
managers and other data consumers to work with the analytic toolset
themselves.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features finished during Sprint ending 2013-04-03 ##
#52 I - Puppetize Limn (N/E)
#61 F - Mobile Site Pageviews by Device Class (N/E)
#378 F - Update Reportcard for April Metrics Meeting (N/E)
#500 D - Webstatscollector filter segfaults when url is too long (N/E)
## Planned for Showcase on 2013-04-10 ##
#60 F - Mobile pageview requests reporting in wikistats (N/E)
#95 F - Edit metadata on chart (8)
#244 F - Track user adoption of Wikipedia Zero (N/E)
#426 F - Authenticate users of Metrics API Admin UI (5)
#460 I - Setup advanced udp2log monitoring (13)
## Current Sprint (ending 2013-04-10) ##
Stories in progress from last sprint:
#60 D - Mobile pageview requests reporting in Wikistats (N/E) Testing
#148 I - Network ACL (N/E) Coding
#244 F - Track user adoption of Wikipedia Zero (N/E) Coding
#460 I - Setup advanced udp2log monitoring (13) Coding
Stories started but blocked (an email will be send to the affected
stakeholders):
#92 F - Page View Metrics Report for Official Wikipedia Mobile Apps (5)
Blocked
#240 F - Session Analysis of mobile site visits by mode
(alpha/beta/standard) (8) Blocked
New stories
#95 F - Edit metadata on chart (8)
#426 F - Authenticate users of Metrics API Admin UI (5)
Queued for Dev
#94 F - Create chart through GUI (13)
#353 D - Wikistats - mobile country report et al. (N/E)
#454 I - Enable git sub-modules in puppet merge (2)
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
Best,
Diederik
Hey folks,
I was running a script to update the revert tables on db1047 with stat1 two
days ago that had some bad disk access patterns. (FYI, don't use python
shelve as an on-disk cache of a dict().) As soon as I saw the load come
up, I killed the script. For any difficulty that occurred in the meantime,
I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine:
- sessions.py - Updating session table on db1047. Useful for measuring
editor labor hours.
- reverts.py - Updating revert tables on db1047. Fixed to not need a
disk cache.
Both of these processes are nice'd, so they should wait in line for CPU
access behind any non-nice'd processes you have running. If the processes
cause any trouble, please feel free to kill them or let me know and I'll
kill them.
For Science,
-Aaron
In the coming weeks, Wikimedia Foundation will organize its analytics
capabilities into a single department to better support the entire
organization in data-driven decision-making. We're looking for a
Director of Analytics to lead this effort. Details here:
http://hire.jobvite.com/Jobvite/Job.aspx?j=oJriXfw9&c=qSa9VfwQ
Please pass this on to people with the right qualifications in your
respective networks.
Thanks,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate
Hi!
Here's the summary of Wednesday's Analytics sprint planning and demo.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
# TL;DR #
Our most recent sprint continued our focus on improving visibility into
mobile initiatives, including the mobile site, support for mobile
applications, and Wikipedia Zero, finetuning the cluster and improving
monitoring of our datastreams.
## Defects & Features taken during Sprint ending 2013-03-27 ##
#68 F - Visualize Commons Mobile App (Android & iOS) metrics in Limn
dashboard (N/E) Done requested by Mobile
#78 F - Document pageview business logic for analysts (N/E) Done requested
by Analytics
#361 D - HTTPS generates two hits in server log, count only one of those
Done requested by Analytics / Community
#155 F - Improve accuracy of packetloss monitoring (N/E) Done requested by
Analytics
#147 I - iptables for NameNode (N/E) Done requested by Ops
#154 F - Provide unsampled blog webtraffic as datastream (N/E) Shipping
requested by Communications
#272 F - Dump stats: tally wikis by activity level (# active users) (N/E)
Ready for Showcase requested by Erik & Sue
#461 F - Configure FairScheduler on Kraken (N/E) Ready for Showcase
requested by Analytics
## Planned for Showcase on 2013-04-03 ##
Mingle:#52 - Puppetize Limn (N/E) Coding
Mingle:#60 (F) - Mobile pageview requests reporting in wikistats
Mingle:#61 (F) - Mobile Site Pageviews by Device Class
Mingle:#92 - Page View Metrics Report for Official Wikipedia Mobile Apps
(5) Testing
Mingle:#244 (F) - Track user adoption of Wikipedia Zero
Mingle:#426 F - Authenticate users of Metrics API Admin UI (5)
## Current Sprint (ending 2013-04-03) ##
The current sprint's theme is still focused on Mobile.
Stories in progress from last sprint:
#52 - Puppetize Limn (N/E) Coding
#61 F - Mobile site pageviews by device class (N/E) Testing
#92 - Page View Metrics Report for Official Wikipedia Mobile Apps (5)
Testing
#60 D - Mobile pageview requests reporting in Wikistats (N/E) Testing
Stories started but blocked:
#240 - Session Analysis of mobile site visits by mode
(alpha/beta/standard) (8) Coding
New stories
#94 F- Create chart through GUI (13)
#353 D - Wikistats cannot run from origin/master (N/E)
#378 F - Update Reportcard for April Metrics Meeting (N/E)
#426 F - Authenticate users of Metrics API Admin UI (5)
#460 I - udp2log server maintenance (13)
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
Best,
Diederik