Great work!

One way for further analysis of such kind of geolinguistic aggregate is to do some data normalization, or geographic normalization, as demonstrated by my previous work: http://www.opensym.org/os2014-files/proceedings/p611.pdf

Any one is welcome to do some data normalization using the geolinguistic size indicators here: https://github.com/hanteng/pyGeolinguisticSize/blob/master/size_geolinguistic.tsv

Currently, it has Population (LP), Internet users (IPop), Economy Size (PPPGDP), etc. estimation based on "even distribution" across percentage share of language population per country based on the Unicode CLDR 25 Territory-Language Information.

A simple linear regression can reveal, say, which geo-linguistic, geographic, or linguistic category has less-than-expected or more-than-expected proportional of viewing traffic, with the expected values being generated according to the sizes of population, Internet population, economy.

I hope this great work by Nemo can be extended to cover

(1) time-series report and data release

(2) edits aggregate

Altogether the tools and datasets will be a major milestone to monitor the language/project development across Wikimedia projects. Congrats!

Best,

han-teng liao

2015-02-26 8:31 GMT+01:00 Federico Leva (Nemo) <nemowiki@gmail.com>:

Erik Zachte, 25/02/2015 23:34:

Compare https://ironholds.shinyapps.io/WhereInTheWorldIsWikipedia/ and
http://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm

Ironholds' looks more vulnerable to bots, it's easier to see in small wikis (though, kudos! many more small wikis are included than in wikistats). For instance, 20 more percentage points for USA on Breton and Bavarian Wikipedias, 30 on Welsh, 40 on Alemannic, almost 50 on Kurdish. For Chinese bots they look similar, though in some cases I'm not sure what's going on: for instance als.wiki also sees CH and RO emerge.

Will the new pageviews definition use the same bot filtering method?

Nemo

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l