Erik,
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location
dataset from Evan (Rosen) that has the following column structure:
*language,country,city,start,end,fraction,ts*
*start* and *end* denote the beginning and ending date for counting the
number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that
is it gives the ratio of 'total edits from that city for that language
Wikipedia project' divided 'total edits from that country for that
language Wikipedia project'. Hence, it cannot be used to understand
global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted)
should also have the global fractions -- total edit from a city divided
by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML
dumps? Or, even better, is the relevant raw data available in CSV form
somewhere else?
Bests,
sumandro
-------------
sumandro
ajantriks.net
On Wednesday 15 May 2013 12:32 AM, analytics-request(a)lists.wikimedia.org
wrote:
> Send Analytics mailing list submissions to
> analytics(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/analytics
> or, via email, send a message with subject or body 'help' to
> analytics-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> analytics-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Analytics digest..."
>
> ----------------------------------------------------------------------
>
>
> Date: Tue, 14 May 2013 19:40:00 +0200
> From: "Erik Zachte" <ezachte(a)wikimedia.org>
> To: "'A mailing list for the Analytics Team at WMF and everybody who
> has an interest in Wikipedia and analytics.'"
> <analytics(a)lists.wikimedia.org>
> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects.
> Message-ID: <016f01ce50ca$0fe736b0$2fb5a410$(a)wikimedia.org>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Awesome work! I like the flexibility of the charts, easy to switch metrics
> and presentation mode.
>
>
>
> 1. WMF has never captured ip->geo data on city level, but afaik this is
> going to change with Kraken.
>
>
>
> 2. Total edits per article per year can be derived from the xml dumps. I may
> have some csv data that come in handy.
>
> For edit wars you need track reverts on an per article basis, right? That
> can also be derived from dumps.
>
> For long history you need full archive dumps and need to calc checksum per
> revision text. (stub dumps have checksum but only for last year or two)
>
>
>
> Erik Zachte
>
>
>
Hi everyone!
Has anyone tried to observer how different wikipedias use the
templates: how often, what's the average depth of template calls, etc?
-----
Yury Katkov, WikiVote
Heya,
We spoke a little bit more about getting a queryable public interface for
pageview data up and running and we decided the following:
1) Start importing webstatscollector pageview daily data for 2013 into
mysql running on labs (not yet scheduled in a sprint)
2) Make simple datawarehouse schema for mysql db (based on the current
webstatscollector datafiles)
Page Table
==========
page_id
page title
Fact Table
========
fact_id
page_id (FK -> Page Table)
pageview date
pageview count
bytes served
3) Collect more datapoints to determine how high of a priority mobile site
article pageview counts are to decide whether we should add this to
webstatscollector or not.
Best,
Diederik
Henrik updated the top view charts and few days ago foundationwiki was
added to webstatscollector. http://stats.grok.se/www.f/top shows
Most viewed articles in 201304
Rank Article Page views
1 Trang chủ 912
2 Portada galega 324
3 Home 182
4 Local chapters 172
etc.
This seems highly unlikely, is the problem known?
Nemo
How would folks feel about a public log of WikiMetrics uses in this form
2013-09-18 20:01:47 [some username] Edits - Oregon newbies
2013-09-18 19:23:18 [some username] Bytes Added - Oregon2013
We're tracking this info right now without sharing it. I don't feel
it's particularly sensitive, and would give us a good shared
understanding of who's using the tool & how. It does expose cohort
names, but I'm not sure why that would be an issue. In a way, it seems
only fair that if we're putting our users under the microscope, we
should also be comfortable with publicly logging what we're doing.
Thanks,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
Hi!
I am still sorry, I have a bit of a backlog with sending the Sprint
Showcase summaries. Here is the update for the sprint that ended on October
16th.
Slidedeck is available at
https://docs.google.com/a/wikimedia.org/presentation/d/1LrNc0oFFkLJOeptGAtV…
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-10-16 ##
# Name Type Customer Estimate
1189 [Bug 54779] Webrequest, mobile, and zero request stream lacking mf-m=a
and mf-m=b markers Analytics Done
1128 Orange Madagascar Wikpedia Zero Dashboard Dan Foy - Wikipedia Zero Done
2
1180 Metrics Meeting October Erik Moeller - Executive Office Done 1
1184 Fix broken cronjob for mobile reports Erik Moeller - Executive Office
Done
1203 [Bug 55527] the Wikidata stats are not available Community Done 1
1204 [Bug 55528] Please collect statistics for the Vietnamese Wikivoyage
Commnity Done 1
1163 [Bug 54358] Make "Select Cohort" button actually select cohort for
metric selection Jessie Wild - Learning & Evaluation Done 1
699 Measure milestones achieved by an editor (threshold) Dario Taraborelli
- Analytics Showcasing 5
## Current Sprint (ending 2013-10-30) ##
Number Name Customer Estimate
699 Measure milestones achieved by an editor (threshold) Dario Taraborelli
- Analytics 5
701 Measure survival of an editor Dario Taraborelli - Analytics 13
818 Asynchronous cohort validation Dario Taraborelli - Analytics 8
1074 Pipe kafka mobile data to upd2log stream Diederik van Liere - Analytics
5
1124 Kafka multiple DC setup Diederik van Liere - Analytics 5
1152 Kafka Monitoring Diederik van Liere - Analytics 5
1168 Restrict geowiki access to WMF employees Jessie Wild - Grantmaking
Learning & Evaluation 8
1177 Bangalink Wikipedia Zero Dashboard Dan Foy - Wikipedia Zero 2
1179 Umniah Jordan Wikipedia Zero Dashboard Dan Foy - Wikipedia Zero 2
1195 Prototype public queryable interface for pageview data Erik Moeller -
Executive Office 21
1202 Tcell Tajikistan Wikipedia Zero Dashboard Dan Foy - Wikipedia Zero 2
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
Best,
Diederik
Hi!
I am sorry, I have a bit of a backlog with sending the Sprint Showcase
summaries. Here is the update for the sprint that ended on October 2nd.
Slidedeck is available at
https://docs.google.com/a/wikimedia.org/presentation/d/1XsSRJkJKacoWg5sbPSk…
## Defects & Features completed (Ready for Showcase/Shipping/Done) during
Sprint ending 2013-10-02 ##
# Name Type Customer Estimate
1137 [Bug 53806] Clarify zero markers in squid logs for IPs that should not
get one Defect Dan Foy - Wikipedia Zero 5
646 Bingle Feature Diederik van Liere - Analytics 5
1133 Hive Partitioning w Camus and JSON SerDe Infrastructure Task Diederik
van Liere - Analytics
1155 Total Active Editors Feature Erik Moeller - Executive Office 1
1088 [Bug 53155] Search datasource is not working Defect Jessie Wild -
Learning & Evaluation 1
1091 [Bug 53177] Create graph feature not working Defect Jessie Wild -
Learning & Evaluation 2
1183 Confidential Work Feature Jessie Wild - Learning & Evaluation 5
1087 [Bug 53118] Large drop in historical total active editors numbers
Defect Tilman Bayer - Communications
701 Measure survival of an editor Feature Dario Taraborelli - Analytics 13
1067 [Bug 51566] Limn: expose tab names in URL Defect Dario Taraborelli -
Analytics 3
1109 [Bug 53417] Deduplication of usernames should happen on combination of
username,project Defect Diederik van Liere - Analytics 1
1166 Time-series wide format for csv output Feature Jessie Wild - Learning
& Evaluation 2
1186 Datetime picker in UI Feature Jessie Wild - Learning & Evaluation 2
## Current Sprint (ending 2013-10-16) ##
(Name, Customer, Estimate, Project Name)
1189 [Bug 54779] Webrequest, mobile, and zero request stream lacking mf-m=a
and mf-m=b markers Analytics Done
1128 Orange Madagascar Wikpedia Zero Dashboard Dan Foy - Wikipedia Zero Done
2
1180 Metrics Meeting October Erik Moeller - Executive Office Done 1
1184 Fix broken cronjob for mobile reports Erik Moeller - Executive Office
Done
1203 [Bug 55527] the Wikidata stats are not available Community Done 1
1204 [Bug 55528] Please collect statistics for the Vietnamese Wikivoyage
Commnity Done 1
1163 [Bug 54358] Make "Select Cohort" button actually select cohort for
metric selection Jessie Wild - Learning & Evaluation Done 1
699 Measure milestones achieved by an editor (threshold) Dario Taraborelli
- Analytics Showcasing 5
Any mingle card can be accessed using the base url
https://mingle.corp.wikimedia.org/projects/analytics/cards/XYZ where XYZ is
the Mingle card id.
If you have any questions, comments or feedback: please let us know!
Apologies for cross-posting; ideally you should receive this on the
Analytics Mailinglist so we can have one focal point for conversation. If
you are not on the Analytics list then please subscribe at
https://lists.wikimedia.org/mailman/listinfo/analytics
Best,
Diederik
Hey all,
If you've seen that for research in to VisualEditor, Echo, and more
recently, onboarding work, we've used a metric developed by Aaron Halfaker.
Aaron needs to take a look at this and give it a vetting, but I've taken a
basic stab at general documentation on Meta:
https://meta.wikimedia.org/wiki/Research:Productive_editor
--
Steven Walling,
Product Manager
https://wikimediafoundation.org/
Hey all,
I think Growth and Mobile are the only ones using Extension:Campaigns
functionality right now to track account creations, but just in case I
wanted to share the fact that for about a week we had data not coming in
for most users on desktop. See:
http://ee-dashboard.wmflabs.org/graphs/enwiki_campaigns for a look at this.
This was due to bug 55765, which has been fixed and deployed. If you see
any oddities in campaign tracking in the recent past, this bug is likely
the culprit.
Thanks,
--
Steven Walling,
Product Manager
https://wikimediafoundation.org/
It shows African countries sized by Wikipedia articles/mentions.
http://geography.oii.ox.ac.uk/2013/10/information-imbalance-africa-on-wikip…
The visualization uses the dumps and wikistats for data. I like the ability
to compare the various country attributes against the core metrics.
Conclusions are in the text below the maps.
-Toby