Today I released two new json files [2][4]. Both complement visualization 'Wikipedia Views Visualized' [1] (aka WiViVi), but both can be useful in other contexts as well. 1) File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted from harvesting World Bank API files. It contains yearly figures for four metrics: (more could be added rather easily): - population counts, - percentage internet users, - percentage mobile subscriptions, - GDP per capita. The following static demographics charts on meta are also based on these metrics: [3] 2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather complex (*) csv files which feed WiViVi. This brings together demographics data and pageviews (by country, by region, and by language), and also adds additional meta info. This json file is meant for external use, as it's much easier to parse than the 3 csv files WiViVi uses itself [5]. (*) complex , as the csv files use a hierarchy based on nested delimiters -- Details: World Bank files have different formats (some csv, some json) and use a variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3). Script 1) first does normalization, then data are aggregated, filtered, indexed. Json file 1) replaces two csv files which up to now were filled from Wikipedia pages [6][7]. Also, although Wikipedia lists nowadays also use World Bank data, this is not consistently done, see [8][9]. [1] Viz: https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html [2] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/world-bank-demograph... Script: https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank [3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics [4] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json Script: https://github.com/wikimedia/analytics-wikistats/tree/master/traffic [5] Syntax: https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html [6] Article: https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_populati... [7] Article: https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users [8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank population counts' [9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank internet percentages'
Thanks for this, Erik. This can be helpful for a variety of projects including https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Beh... and the next steps for this project.
L
On Wednesday, July 11, 2018, Erik Zachte ezachte@wikimedia.org wrote:
Today I released two new json files [2][4]. Both complement visualization 'Wikipedia Views Visualized' [1] (aka WiViVi), but both can be useful in other contexts as well.
- File 'demographics_from_world_bank_for_wikimedia.json' [2] resulted
from harvesting World Bank API files. It contains yearly figures for four metrics: (more could be added rather easily):
- population counts,
- percentage internet users,
- percentage mobile subscriptions,
- GDP per capita.
The following static demographics charts on meta are also based on these metrics: [3] 2) File 'datamaps-data.json' [4] contains the equivalent of 3 rather complex (*) csv files which feed WiViVi. This brings together demographics data and pageviews (by country, by region, and by language), and also adds additional meta info. This json file is meant for external use, as it's much easier to parse than the 3 csv files WiViVi uses itself [5]. (*) complex , as the csv files use a hierarchy based on nested delimiters -- Details: World Bank files have different formats (some csv, some json) and use a variety of indexes (some use ISO 3166-1 alpha-2 codes, others ..-alpha-3). Script 1) first does normalization, then data are aggregated, filtered, indexed. Json file 1) replaces two csv files which up to now were filled from Wikipedia pages [6][7]. Also, although Wikipedia lists nowadays also use World Bank data, this is not consistently done, see [8][9]. [1] Viz: https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html [2] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/ world-bank-demographics.json Script: https://github.com/wikimedia/analytics-wikistats/tree/master/worldbank [3] Charts: https://meta.wikimedia.org/wiki/World_Bank_demographics [4] Json: https://stats.wikimedia.org/wikimedia/animations/wivivi/datamaps-data.json Script: https://github.com/wikimedia/analytics-wikistats/tree/master/traffic [5] Syntax: https://stats.wikimedia.org/wikimedia/animations/wivivi/data.html [6] Article: https://en.wikipedia.org/wiki/List_of_countries_and_ dependencies_by_population [7] Article: https://en.wikipedia.org/wiki/List_of_countries_by_number_ of_Internet_users [8] Talk page: https://bit.ly/2L5Z2P4 section 'Wikipedia vs Worldbank population counts' [9] Talk page: https://bit.ly/2NJUoIu section 'Wikipedia vs Worldbank internet percentages' _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org