FWIW: depending on the threshold chosen in step 2 of Anonymization suggested by Yuvi, some of the countries/languages will have no data. This data will solve the problem for some of the partners, but not all of them. 

On Monday, August 25, 2014, Jessie Wild <jwild@wikimedia.org> wrote:

For grantmaking, this is the exact type of dataset we want to have publicly available. A lot of the initiatives we fund are at a country-based level, and our partners have a really hard time understanding the effects of the work they are doing on the aggregate language-wiki level. In addition to this edits per country, it would be even more important for us to get the total number of editors / active editors by country as well. Kevin - it would be great to get an update from on the timeline for this (in Q4 2014-15, it was punted to Q1 2014-15, but I haven't heard anything about it yet ...)

Thanks for starting this work, Yuvi!

On Mon, Aug 25, 2014 at 9:43 AM, Yuvi Panda <yuvipanda@gmail.com> wrote:
On Mon, Aug 25, 2014 at 5:41 PM, Kevin Leduc <kevin@wikimedia.org> wrote:
> Hey Yuvi,
> this sounds like very interesting data to look at.  Here are my thoughts:


> - the Anonymization scheme sounds reasonable, and I'd like to hear from
> someone else @ wikimedia who has similar experience anonymizing data sets

Glad to hear that!

> - you were probably already thinking about it, but we need documentation
> too: a wikipage with the name of the table, data dictionary, etc... and even
> a blog post to announce the newly available data.

Oh yeah, definitely. Will come once the code, etc is done :)

Yuvi Panda T

Analytics mailing list

Jessie Wild Sneller
Grantmaking Learning & Evaluation 
Wikimedia Foundation

Imagine a world in which every single human being can freely share in
the sum of all knowledge.  Help us make it a reality!
Donate to Wikimedia