Thanks for this, Erik. This can be helpful for a variety of projects including  https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Robustness_across_languages and the next steps for this project.

L

On Wednesday, July 11, 2018, Erik Zachte <ezachte@wikimedia.org> wrote:
 Today I released two new json files [2][4].
Both complement visualization 'Wikipedia Views Visualized' [1] (aka
WiViVi), but both can be useful in other contexts as well.
1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted from
harvesting World Bank API files.
It contains yearly figures for four metrics: (more could be added rather
easily):
- population counts,
- percentage internet users,
- percentage mobile subscriptions,
- GDP per capita.
The following static demographics charts on meta are also based on these
metrics: [3]
2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather
complex (*) csv files which feed WiViVi. This brings together demographics
data and pageviews (by country, by region, and by language), and also adds
additional meta info. This json file is meant for external use, as it's
much easier to parse than the 3 csv files WiViVi uses itself [5].
(*) complex , as the csv files use a hierarchy based on nested delimiters
--
Details:
World Bank files have different formats (some csv, some json) and use a
variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3).
Script 1) first does normalization, then data are aggregated, filtered,
indexed.
Json file 1) replaces two csv files which up to now were filled from
Wikipedia pages [6][7].
Also, although Wikipedia lists nowadays also use World Bank data, this is
not consistently done, see [8][9].
[1] Viz:
https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html
[2] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/world-bank-demographics.json
    Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank
[3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics
[4] Json:
https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json
    Script:
https://github.com/wikimedia/analytics-wikistats/tree/master/traffic
[5] Syntax:
https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html
[6] Article:
https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population
[7] Article:
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users
[8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank
population counts'
[9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank
internet percentages'
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--

--
Leila Zia
Senior Research Scientist
Wikimedia Foundation