Hi sumandro, I've worked with this data generated by ERosen looking for ptwiki stats and I think I can help you.
Given a period of time you can get the total edits of a country and the count for edits in all countries in that period. With this data you can generate the "country fraction" and then if you multiply the city fraction by its country fraction, you get the city "global fraction".
Best,
Henrique Andrade
On Tue, May 14, 2013 at 5:43 PM, sumandro mail@ajantriks.net wrote:
Erik,
Thanks a lot for the appreciation.
As Sajjad mentioned, we have already obtained a edit-per-location dataset from Evan (Rosen) that has the following column structure:
*language,country,city,start,**end,fraction,ts*
*start* and *end* denote the beginning and ending date for counting the number of edits, and *ts* is time stamp.
The *fraction*, however, gives a national ratio of edit activity, that is it gives the ratio of 'total edits from that city for that language Wikipedia project' divided 'total edits from that country for that language Wikipedia project'. Hence, it cannot be used to understand global edit contributions to a Wikipedia project (for a time period).
It seems that the original data (from where this dataset is extracted) should also have the global fractions -- total edit from a city divided by total edit from the whole world, for a project, for a time period.
Would you know if the global fractions can also be derived from the XML dumps? Or, even better, is the relevant raw data available in CSV form somewhere else?
Bests,
sumandro
sumandro ajantriks.net
On Wednesday 15 May 2013 12:32 AM, analytics-request@lists.**wikimedia.organalytics-request@lists.wikimedia.orgwrote:
Send Analytics mailing list submissions to analytics@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics or, via email, send a message with subject or body 'help' to analytics-request@lists.**wikimedia.organalytics-request@lists.wikimedia.org
You can reach the person managing the list at analytics-owner@lists.**wikimedia.organalytics-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Analytics digest..."
------------------------------**------------------------------**
Date: Tue, 14 May 2013 19:40:00 +0200 From: "Erik Zachte" ezachte@wikimedia.org
To: "'A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.'" <analytics@lists.wikimedia.org**> Subject: Re: [Analytics] Visualizing Indic Wikipedia projects. Message-ID: 016f01ce50ca$0fe736b0$**2fb5a410$@wikimedia.org Content-Type: text/plain; charset="iso-8859-1"
Awesome work! I like the flexibility of the charts, easy to switch metrics and presentation mode.
- WMF has never captured ip->geo data on city level, but afaik this is
going to change with Kraken.
- Total edits per article per year can be derived from the xml dumps. I
may have some csv data that come in handy.
For edit wars you need track reverts on an per article basis, right? That can also be derived from dumps.
For long history you need full archive dumps and need to calc checksum per revision text. (stub dumps have checksum but only for last year or two)
Erik Zachte
______________________________**_________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/analyticshttps://lists.wikimedia.org/mailman/listinfo/analytics