Erik,
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*language,country,city,start,end,fraction,ts*
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
somewhere else?
Bests,
sumandro
-------------
sumandro
ajantriks.net
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
wrote:
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
> ----------------------------------------------------------------------
>
>
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> <analytics(a)lists.wikimedia.org>
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
>
>
>
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
>
>
>
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
>
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
>
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
>
>
>
> Erik Zachte
>
>
>
Hi everyone!
Has anyone tried to observer how different wikipedias use the
templates: how often, what's the average depth of template calls, etc?
-----
Yury Katkov, WikiVote
Henrik updated the top view charts and few days ago foundationwiki was
added to webstatscollector. http://stats.grok.se/www.f/top shows
Most viewed articles in 201304
Rank Article Page views
1 Trang chủ 912
2 Portada galega 324
3 Home 182
4 Local chapters 172
etc.
This seems highly unlikely, is the problem known?
Nemo
Cross-posting.
---------- Forwarded message ----------
From: Ori Livneh <ori(a)wikimedia.org>
Date: Thu, Jun 27, 2013 at 10:31 PM
Subject: Cluster reverted to wmf7
There was a massive, cluster-wide outage that lasted about half an hour
(totally unrelated to our code). It was ultimately resolved by reverting
the entire cluster to wmf7. Logging jobs are probably affected. Just FYI.
Incident report will eventually be posted here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20130628-Site
---
Ori Livneh
ori(a)wikimedia.org
--
Steven Walling
https://wikimediafoundation.org/
Hi!
This sprint we delivered 49 points, for the coming sprint we have scheduled
43 points.
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-06-26 ##
#775 F - X-CS pageview counts for June 2013 Amit Kapoor - Wikipedia Zero
Done (1)
#765 F - Implement Authentication with OAuth Frank Schulenburg - Grantmaking
Showcasing (3)
#764 F - Wikipedia Zero pageview count May 2013 Amit Kapoor - Wikipedia Zero
Done (2)
#758 F - Counts for new/repeat uploaders on Commons Howie Fung - Product
(General) Showcasing (5)
#746 F - Run Arabic cohort analysis Frank Schulenburg - Grantmaking
Showcasing (1)
#738 D - Cronjob fails new mobile pageview report Diederik van Liere -
Analytics Shipping (3)
#727 F - Breakdown of WLM uploads by country Community - Analytics Done (8)
#710 F - Restrict people to only see the cohorts they have created Jessie
Wild - Learning & Evaluation Showcasing (3)
#698 F - Metrics User interaction Jessie Wild - Learning & Evaluation
Showcasing (2)
#547 F - Replace varnishncsa with varnishkafka Analytics Team - Analytics
Done (8)
#244 F - Track user adoption of Wikipedia Zero Amit Kapoor - Wikipedia Zero
Showcasing (5)
#131 I - Puppetize + Debianize Kafka 0.8 Diederik van Liere - Analytics
Showcasing (8)
## Current Sprint (ending 2013-07-10) ##
Stories in progress from last sprint:
#716 F - Debianization of dClass-dev and dClass-dev-jni (5) requested by Ops
#719 I - Cleanup duplicate loglines (5) requested by Diederik (Analytics)
#738 F - Cronjob fails new mobile pageview report (N/E) requested by
Diederik (Analytics)
#469 F - Comparison of different pageview definitions (8) requested by
Mobile (Tomasz)
New stories:
#700 F - Port static cohort user interaction (3) requested by
E3/Grantmaking
#705 F - Port over of Bytes added metric (5) requested by Product
#760 I - Debianize Librdkafka requested by (3) Analytics/Ops
#777 F - Varnishkafka output format (N/E) requested by Analytics/Ops
#781 F - Monthly Reportcard May 2013 (1) requested by Erik Moeller
#696 F - Porting over of the request logic (5) requested by E3/Grantmaking
#766 F - Zero Dashboard for Pageview Metrics (5) requested by Wikipedia Zero
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
Two especially relevant bits in here, regarding EventLogging and
CoreEvents.
---------- Forwarded message ----------
From: Steven Walling <swalling(a)wikimedia.org>
Date: Fri, Jun 21, 2013 at 5:31 PM
Subject: Editor engagement experiments updates
To: WMF Editor Engagement Team <ee(a)lists.wikimedia.org>
Hi everyone, and happy Friday!
This is a quick list of highlights from what the Editor Engagement
Experiments team deployed to the wikis yesterday:
*GettingStarted*: the interface change of note is that the toolbar
(presented on articles if you accept a GettingStarted task) is now much
more responsive on smaller screen sizes and print. Since we opted to hide
the toolbar on screen sizes where it would otherwise be broken (i.e. below
about 850px of width), we started explicitly logging whether users saw the
toolbar or not.
*GuidedTours*: logging for guided tours was recently broken by our last
release. This wasn't a big deal since we weren't running any active
controlled tests, but we deployed and verified fixes for logging, along
with cleaning up other parts of the architecture.
*CoreEvents*: CoreEvents is a new extension, deployed just last week, to
house logging of certain events in MediaWiki core, like preference updates.
Yesterday we added logging of whether an edit was made via the API or
mobile. Learn more at https://www.mediawiki.org/wiki/Extension:CoreEvents
*EventLogging*: the big change here is that we added an API module for
retrieving the JSON of a schema. Like index.php?action=raw on a wikitext
page, the module returns the raw JSON content of the schema and lets you
refer to a specific revision. Here's an example:
https://meta.wikimedia.org/w/api.php?action=jsonschema&revid=5588433&format…
--
Steven Walling
https://wikimediafoundation.org/
--
Steven Walling
https://wikimediafoundation.org/
Hi all,
I've should have posted this earlier here.
Wikimedia France, Wikimedia UK, Wikimedia NL and Wikimedia CH funded a project to create better tools voor GLAMs to upload their collections, including metadata, to Wikimedia Commons. Simultaneously the project did research into what GLAMs need in terms of analytics of these collections to be able to justify sharing their collections on Wikipedia. The researched can be condensed to a single paragraph requirement:
'A GLAM analytics system is an open dashboard that shows all contributing institutions with monthly overviews of objects, their plays and their usage in Wiki projects. Ideally, the data is created by Kraken and presentation is handled through Limn with well-designed graphs providing exportable datasets.'
This is something that the Analytics team has adopted and will be working on in the coming period according to Diederik. They started working on importing commons data: https://mingle.corp.wikimedia.org/projects/analytics/cards/723 something that is probably important for other people on this list as well like Magnus.
That research has now been published here and includes high-level and low-level requirements:
http://pro.europeana.eu/web/guest/pro-blog/-/blogs/europeana-glamtools-publ…
Cheers,
Maarten
--
Kennisland | www.kennisland.nl | t +31205756720 | m +31643053919 | @mzeinstra
FYI :)
---------- Forwarded message ----------
From: Erik Moeller <erik(a)wikimedia.org>
Date: Mon, Jun 17, 2013 at 9:58 AM
Subject: Toby Negrin joins Wikimedia Foundation as Director of Analytics
To: wikimediaannounce-l(a)lists.wikimedia.org
Hello all,
it’s my great pleasure to announce that as of today, Toby Negrin is
joining the Wikimedia Foundation as the new Director of Analytics.
Toby will be responsible for leading the analytics team, which is
responsible for enabling data-driven decisions in the Wikimedia
Foundation and the broader Wikimedia community.
As of today, the team consists of: Diederik van Liere, Dario
Taraborelli, Andrew Otto, Evan Rosen, Dan Andreescu, Stefan Petrea
(contractor), Erik Zachte (part-time), and Aaron Halfaker
(contractor). Newly integrated into the team are Dario, Evan and
Aaron.
Toby joins us from DeNA (formerly ngmoco), a $2B Japanese mobile
gaming company where he was Director of Analytics in the US from 2011
to 2013. He enabled data-informed decision making throughout the
company, established an Insights team and scaled the Analytics team to
21 members. He managed a 300+ node Hadoop platform, multiple data
driven applications and led the effort to open source Mobilize, a
script deployment and dataviz framework developed in-house at DeNA.
Prior to DeNA, Toby was Director of Product Management for Cloud
Platforms and Hadoop at Yahoo! from 2008-2011. Leading 10 PMs at peak,
through this group Toby was responsible for interfacing between the
hundreds of internal users of analytics, storage and other cloud
services and the developers/maintainers of said infrastructure. There
aren’t many jobs that could prepare you for Wikimedia’s complex
network of analytics stakeholders, but this surely is one of them.
Toby has worked as a software engineer for many years and holds a BS
Equivalent in Computer Science from California State University, an
MBA from NIMBAS Graduate School of Management in Utrecht, and a BA in
Visual Culture and History from University of California, Santa Cruz.
After growing up in the Bay Area, Toby’s lived recently in Stockholm
and Amsterdam so he’s using his spare time to explore California’s
wilderness on two feet and two wheels with his family. His two
daughters keep him pretty busy!
Toby is looking forward to making the shift to a mission-driven
non-profit organization. When I asked him what he’d be doing if not
Wikimedia, he expressed an interest in urban planning, green cities,
and public transport. In an alternative universe, San Francisco is
becoming a greener city with a more reliable public transport system.
In this one, we get awesome Wikimedia analytics instead. There are
always tradeoffs. ;-)
Toby’s incredibly excited about working with the team to tackle
Wikimedia’s analytics challenges and the increasing hunger for data
across the organization and the movement. Please join me in welcoming
him on board. :-)
All best,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Hi all,
as you might know, I have a few GLAM-related tools on the toolserver. Some
are updated once a month, some can be used live, but all are in high demand
by GLAM institutions.
Now, the monthly updated stats have always been slow to run, but did almost
grind to a halt recently. The on-demand tools have stalled completely.
All these tools get their data from stats.grok.se, which works well but not
really high-speed; my on-demand tools have apparently been shut out
recently because too many people were using them, DDOSing the server :-(
I know you are working on page view numbers, and for what I gather it's
up-and-running internally already. My requirements are simple: I have a
list of pages on many Wikimedia projects; I need view counts for these
pages for a specific month, per-page.
Now, I know that there is no public API yet, but is there any way I can get
to the data, at least for the monthly stats?
Cheers,
Magnus
Hi!
This sprint we delivered 43 points, for the coming sprint we have scheduled
63 points.
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-06-12 ##
#506 I - Puppetize Hadoop ecosystem clients for Hadoop nodes (2) Done
requested by Ops/Analytics
#533 I - Puppetize Zookeeper (3) Showcasing requested by Ops/Analytics
#545 I - Librdkafka supports Kafka 0.8 (13) Done requested by Ops/Analytics
#677 F - Metrics Meeting May (1) Done requested by Erik Moeller
#693 F - Implement new UMAPI DB design (8) Done requested by E3/Grantmaking
#695 F - Copy over the UI components UMAPI (2) Done E3/Grantmaking
#715 F - Monitoring and alerts on webrequest loss in Kraken (8) Done
requested by Analytics
#744 F - LiAnnas request for UMAPI data (3) Done requested by Grantmaking
#755 D - Fixing public datasets rsync job on stat1001 (1) Done requested by
E3
#759 D - CSV Upload id not handle commas, single quotes, or double quotes
(2) Done requested by Mobile
## Current Sprint (ending 2013-06-26) ##
Stories in progress from last sprint:
#131 I - Puppetize + Debianize Kafka 0.8 (8) requested by Analytics / Ops
#244 F - Track user adoption of Wikipedia Zero (5) requested by Amit
(Wikipedia Zero)
#385 I - Migration of stat1 to stat1002 (3) requested by Ops/Analytics
#547 F - varnishkafka (13) requested by Analytics/Ops
#716 F - Debianization of dClass-dev and dClass-dev-jni (5) requested by Ops
#719 I - Cleanup duplicate loglines (5) requested by Diederik (Analytics)
#727 F - Breakdown of WLM uploads by country (N/E)
#731 I - Reinstall Hadoop Nodes (8) requested by Ops
#738 F - Cronjob fails new mobile pageview report (N/E) requested by
Diederik (Analytics)
New stories:
#469 F - Comparison of different pageview definitions (8) requested by
Mobile (Tomasz)
#698 F - Metrics User interaction (2) requested by E3/Grantmaking
#700 F - Port static cohort user interaction (3) requested by E3/Grantmaking
#710 F - Restrict people to only see the cohorts they have created (3)
requested by E3/Grantmaking
#746 F - Run Arabic cohort analysis (1) requested by Grantmaking
#758 F - Counts for new/repeat uploaders for Commons (N/E) requested by
Howie (Product)
(Number in parentheses) = estimate of complexity
N/E = not estimated;
F = Feature
D = Defect
I = Infrastructure Task
S = Spike
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Best,
Diederik
PS:
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics