Great work!
Currently, it has Population (LP), Internet users (IPop), Economy Size (PPPGDP), etc. estimation based on "even distribution" across percentage share of language population per country based on the Unicode CLDR 25 Territory-Language Information.
A simple linear regression can reveal, say, which geo-linguistic, geographic, or linguistic category has less-than-expected or more-than-expected proportional of viewing traffic, with the expected values being generated according to the sizes of population, Internet population, economy.
I hope this great work by Nemo can be extended to cover
(1) time-series report and data release
(2) edits aggregate
Altogether the tools and datasets will be a major milestone to monitor the language/project development across Wikimedia projects. Congrats!
Best,
han-teng liao