Hi folks,
I am conducting a project on the value of using celebrity names in newspaper headlines. I am seeking your help to see if it is possible to get historical page view data for biographies between he years 2013 to 2015. I’ve been experimenting with this legacy API that looked promising: https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_p... <https://wikimedia.org/api/rest_v1/#/Pageviews data/get_metrics_pageviews_per_article__project___access___agent___article___granularity___start___end_> However, I receive the error “the date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet” whenever I execute a search for a biography for dates preceding summer 2015. I get the same message for searching for something like “Brad Pitt” as for “Isaac Newton.” Does anyone know of any way to get this page view information? I would be very grateful, and do let me know if you need more information from me.
Thank you, Marianne (Cornell Phd Student in Information Science)
Hi Marianne,
On 19/3/20 17:37, Marianne Aubin Le Quere wrote:
I am conducting a project on the value of using celebrity names in newspaper headlines. I am seeking your help to see if it is possible to get historical page view data for biographies between he years 2013 to 2015. I’ve been experimenting with this legacy API that looked promising: https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_p... <https://wikimedia.org/api/rest_v1/#/Pageviews data/get_metrics_pageviews_per_article__project___access___agent___article___granularity___start___end_> However, I receive the error “the date(s) you used are valid, but we either do not have data for those date(s), or the project you asked for is not loaded yet” whenever I execute a search for a biography for dates preceding summer 2015. I get the same message for searching for something like “Brad Pitt” as for “Isaac Newton.” Does anyone know of any way to get this page view information? I would be very grateful, and do let me know if you need more information from me.
The data, pre-October 2015 are not available from that api, but you can use the dataset that I have made available here:
under "Wikipedia's pagecounts".
These dataset are elborated from the pagecounts-raw dataset[1a]. Raw data are available on dumps.wikimedia.org[1b].
The main difference is that the original pagecounts-raw data are available in files sorted by date (hourly files), while my dataset is sorted by filename.
There is also the pagecounts-ez dataset[2a] which is a more compact representation[2b] of the same data. Raw data from 2011 onwards are available on dumps.wikimedia.org[2c], as well. If you need data prior to 2011 you find them on cricca.disi.unitn.it[2d]
Hope this helps.
[1a]: https://wikitech.wikimedia.org/wiki/Analytics/Archive/Data/Pagecounts-raw [1b]: https://dumps.wikimedia.org/other/pagecounts-raw/ [2a]: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pagecounts-e... [2b]: https://dumps.wikimedia.org/other/pagecounts-ez/ [2c]: https://dumps.wikimedia.org/other/pagecounts-ez/merged/ [2d]: http://cricca.disi.unitn.it/datasets/pagecounts-ez/