Hello Noah, Thank you for reaching out to us :) The reason for which we have not backfilled the "top pageview per country" data is because, to secure privacy of our users, we use a filtering mechanism to remove pages that have been seen by less than 1000 actors a day, and that the data allowing us to do so is kept only for 90 days. I have just created a task in our phabricator board for us to investigate other filtering methods that could allow us to release historical data, even if less detailed (https://phabricator.wikimedia.org/T299627). Sorry for not being able to help and best of luck for your studies :)
Joseph for the Data Engineering (ex-Analytics) team
On Tue, Jan 18, 2022 at 1:33 AM Noah Brunken Syrkis nobr@itu.dk wrote:
Hello,
I noticed that the public api for daily top viewed pages per country[1] only goes back to Jan 1st, 2021. Could this be backfilled from other datasets to 2015, without too much effort on Your part? The research team encouraged me to ask here, when I spoke with them about my need for the data—I'm a data science student at the IT University of Copenhagen doing a thesis on predicting country level human value survey responses[2] based on the top read Wikipedia pages in the given country.
Thanks! Noah
[1] https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_t...
[2] http://www.europeansocialsurvey.org/downloadwizard/ _______________________________________________ Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-leave@lists.wikimedia.org