Hi!
Yesterday's sprint demo concluded the second sprint of the "Self-Serve
Observational Analytics" Release. The goal of this release is to schedule
features that will empower end-users to interact independently with the
Analytics toolset.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-04-17 ##
#510 I - Upgrade Emery to Precise (8) Done requested by Analytics & Ops
#513 I - Migrate UserMetrics API to Gerrit (N/E) Done requested by Analytics
#559 I - Debianization of python-jsonschema package (N/E) Done requested by
E3/Ori
#560 I - Debianization of python-voluptuous package (N/E) Done requested by
Platform/Hashar
#561 I - Debianization of python-statsd package (N/E) Done requested by
Platform/Hashar
#572 D - Sudden drop in pageviews (N/E) Done requested by Community
#60 D - Mobile pageview requests reporting in Wikistats (N/E) Ready for
Showcase requested by Mobile/Tomasz
#408 F - Import of 1:1000 All-traffic Stream into HDFS (N/E) Ready for
Showcase requested by Analytics
#539 D - Error Saving an Ad-Hoc Datasource (N/E) Ready for Showcase
requested by Community/Dan
#61 F - Mobile Site Pageviews by Device Class (N/E) Ready for Showcase
requested by Mobile/Tomasz
#95 F - Create chart through GUI (13) Ready for Showcase requested by
Grants&Programs/Jessie
#497 I - Setup Kraken puppetmaster in Labs (3) Ready for Showcase requested
by Ops & Analytics
#569 F - Pageviews metrics for Hebrew and Ukrainian wikivoyage (N/E) Ready
for Showcase requested by Community
## Planned for Showcase on 2013-04-24 ##
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) requested
by Mobile/Tomsz
#131 I - Puppetize Kafka 0.7 (8) Coding requested by Analytics & Ops
#240 F - Session Analysis of Mobile Site Visits by Mode (8) requested by
Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) requested by Wikipedia
Zero/Amit
#518 I - Setup SSL for User Metrics (3)
## Current Sprint (ending 2013-04-24) ##
Stories in progress from last sprint:
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) Queued
for Dev (bumped back from Coding) requested by Mobile/Tomasz
#148 I - Network ACL (N/E) BLOCKED requested by Ops/Mark
#131 I - Puppetize Kafka 0.7 (8) Coding requested by Analytics & Ops
#240 F - Session Analysis of Mobile Site Visits by Mode (8) Coding
requested by Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) Testing requested by
Wikipedia Zero/Amit
New stories
#134 I - Puppetize Hadoop CDH4 (13)
#388 F - Admin defines new static cohort by uploading CSV (5)
#518 I - Setup SSL for User Metrics (3)
#570 I - Local dev env for User Metrics (8)
Queued for Dev
#96 F - Save-As existing chart (1)
#353 S - Wikistats - mobile country report et al. (N/E)
#540 S - Look into potential implementations of job status in User Metrics
API
#541 S - Options to productionize Snuggle
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Howdy Tomasz & co!
The backfill job for mobile web pageviews by device class + device
OS[1] finished this morning, and data is now available for all of March;
April is up to date, excepting yesterday[2]. Check it out:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props/201…
The job runs daily at midnight, producing a rollup for the last 24 hours
...such as this one from my birthday:
http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props/201…
The same job then recalculates the rollup for the current month, keeping
the aggregate fresh. Here's March (which won't change any further):
http://stats.wikimedia.org/kraken-public/webrequest/mobile/device/props/201…
The numbers for March should complete and correct, though there's certainly
room for improvement in both classification and data hygiene[3]. Tablets
are especially a pain-point for dClass, and we know that's important to
y'all. That said, all feedback and suggestions are very are welcome,
especially if you see anything fishy. Chat up Diederik or the list and
we'll totally get all Mingley with your ideas.
Cheers!
Team Analytics
--
David Schoonover
dsc(a)wikimedia.org
[1] Feature card:
https://mingle.corp.wikimedia.org/projects/analytics/cards/61
[2] A race when restarting Hadoop's ResourceManager and slave NodeManagers
after a config upgrade caused a silent failure, impacting imports for
approximately five hours. (Specifically, imports wadded up into in an ugly,
sticky, 158GB mess of duplicated records.) I've cleansed the data and
restored the import boundaries, but I believe the Device Props job
triggered before I was finished. Once all the data is in for today, I'll
rerun both 4/15 and 4/16.
[3] Card tracking the hygiene issues:
https://mingle.corp.wikimedia.org/projects/analytics/cards/591
Hi all,
I have a very non-technical related question about the report card.
I am looking at this chart:
http://reportcard.wmflabs.org/graphs/unique_visitors
If I want to measure the unique visitors to Wikimedia projects should I sum up al the categories in this graph or just take the 'world' information?
I'm asking this because 'world' seems to suggest a summation of the other labels. However a sum of all regions does not equal the number under 'world'.
Cheers,
Maarten
Hey,
As part of ongoing cleanup and debugging of varnishncsa, part of the Varnish toolset for udp logging of access requests, I'm removing our custom code for escaping spaces in headers. Since we've now switched to using tabs as field separators, this shouldn't be necessary anymore.
Please let me know if there are any objections - if not I'll deploy this change in the next couple of days.
--
Mark Bergsma <mark(a)wikimedia.org>
Lead Operations Architect
Wikimedia Foundation
As Ori put the EventLogging service into production he created an
eventlogging-alerts list to keep developers and users of EventLogging
data informed about any service or data quality issues.
The user metrics API is quickly becoming relevant to multiple teams in
the org who will likely also need to stay informed about similar kinds
of issues, whether through automatic notifications or manual emails.
Not all of them will want to be on analytics@, and not all folks on
this list will care about the nitty-gritty.
Would it make sense to setup umapi-alerts to serve the same purpose as
eventlogging-alerts, but for the user metrics API?
Cheers,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate
Jeff Heer just announced the new release, I was not aware of it, it looks particularly relevant in the context of Limn graph metadata.
github.com/trifacta/vega
Vega is a visualization grammar, a declarative format for creating and saving visualization designs. With Vega you can describe data visualizations in a JSON format, and generate interactive views using either HTML5 Canvas or SVG.
Hi!
Yesterday's sprint demo concluded the first sprint of the "Self-Serve
Observational Analytics" Release. The goal of this release is to schedule
features that will empower end-users to interact independently with the
Analytics toolset.
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-04-10 ##
#60 D - Mobile pageview requests reporting in Wikistats (N/E) requested by
Mobile/Tomasz
#426 F - Authenticate users of Metrics API Admin UI (5) requested by
E3/Dario
#454 I - Enable git sub-modules in puppet merge (2) requested by
Analytics/Andrew
#460 I - Setup advanced udp2log monitoring (13) requested by Analytics
#551 F - Mobile pageviews - Documentation (N/E) requested by Analytics
## Planned for Showcase on 2013-04-17 ##
#61 F - Mobile Site Pageviews by Device Class (N/E) requested by
Mobile/Tomasz
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) requested
by Mobile/Tomsz
#240 F - Session Analysis of Mobile Site Visits by Mode (8) requested by
Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) requested by Wikipedia
Zero/Amit
## Current Sprint (ending 2013-04-17) ##
Stories in progress from last sprint:
#60 D - Mobile pageview requests reporting in Wikistats (N/E) Testing
#61 F - Mobile Site Pageviews by Device Class (N/E) requested by
Mobile/Tomasz
#92 F - Pageview metrics report for Official WMF Mobile Apps (5) requested
by Mobile/Tomsz
#95 F - Edit metadata on chart (8) requested by Grantmaking/Jessie
#148 I - Network ACL (N/E) requested by Ops/Mark
#240 F - Session Analysis of Mobile Site Visits by Mode (8) requested by
Mobile/Maryana
#244 F - Track user adoption of Wikipedia Zero (N/E) requested by Wikipedia
Zero/Amit
#497 I - Setup Kraken puppetmaster in Labs requested by Analytics
#551 F - Mobile pageviews - Documentation requested by Analytics
New stories
#131 I - Puppetize Kafka (8) requested by Analytics
#510 I - Upgrade Emery to Precise (8) requested by Ops
#430 F - Expose parameters in metrics view via Metrics API (5) requested by
E3
#559 I - python-jsonschema debianization requested by Platform
#560 I - python-voluptuous debianization requested by Platform
#561 I - python-statsd debianization requested by Platform
Queued for Dev
#94 F - Create chart through GUI (13)
#96 F - Save-As existing chart (1)
#353 D - Wikistats - mobile country report et al. (N/E)
#540 S - Look into potential implementations of job status in User Metrics
API
#541 S - Options to productionize Snuggle
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Hi yalls!
We've working on scheduling a Ubuntu Precise upgrade for 2 of the udp2log machines: emery and oxygen. emery has been around for quite a while, and has some filters that may not be used anymore. Here's a list of the ones that we'd like to disable if there are no interested parties.
# India
pipe 10 /usr/bin/udp-filter -F '\t' -c IN -b country -g >> /var/log/squid/india.tab.log
# GLAM NARA / National Archives - RT 2212
pipe 10 /usr/bin/udp-filter -F '\t' -p _NARA_ -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/glam_nara.tab.log
# Location geocoding filter for Erik Zachte.
pipe 1000 /usr/bin/udp-filter -F '\t' -g -b everything >> /var/log/squid/location-1000.tab.log
# specific country filters - 2012-01-24 through 2012-02-20 then ask Nimish or Amit if we still need them
pipe 10 /usr/bin/udp-filter -F '\t' -c CD,CF,CI,GQ -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-1.tab.log
pipe 10 /usr/bin/udp-filter -F '\t' -c KH,BW,CM,MG,ML,MU,NE,VU -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-10.tab.log
pipe 100 /usr/bin/udp-filter -F '\t' -c BD,BH,IQ,JO,KE,KW,LK,NG,QA,SN,TN,UG,ZA -g -m /var/log/squid/filters/GeoIPLibs/GeoIP.dat -b country >> /var/log/squid/countries-100.tab.log
# Temporary filter to estimate view of 2012 fundraiser video
pipe 10 /usr/bin/udp-filter -F '\t' -d wikimediafoundation.org -p Thank_You_Main >> /var/log/squid/wmf.org-Thank_You_Main.tab.log
Let us know if you'd like to keep these running.
Thanks!
-AndrewO + Diederik
Hi folks!
I'm trying to get a good query to generate a list of all the users
enrolled in courses with the Education Program extension (on en.wiki,
for the immediate purposes).
Basically, I want a query for all the usernames of any user who is
enrolled in any course for a set of terms (Spring 2013, 2013 Q1, and
2013-Q1), excluding users who have any of the four 'course' userrights
(Course coordinator, Course instructor, Course online volunteer,
Course campus volunteer).
The query that I have right now (thanks to help from Oliver and Jeroen) is this:
SELECT DISTINCT ep_users_per_course.upc_user_
id FROM ep_users_per_course INNER JOIN ep_courses WHERE
ep_courses.course_term IN ('Spring 2013','2013 Q1','2013-Q1') AND
ep_users_per_course.upc_role = 0;
But that isn't adequately isolating the actual students (as opposed to
instructors and others who are enrolled in classes as students), and
returns user IDs rather than usernames.
Any help figuring out the right query would be much appreciated!
Cheers,
Sage