Hi Srdjan,
The data pipeline behind the API can't handle arbitrary skip or limit parameters, but there's a better way for the kind of question you have. We publish all the pageviews at https://dumps.wikimedia.org/other/pagecounts-ez/, look at the "Hourly page views per article" section. I would imagine for your use case one month of data is enough, and you can get the top N articles for all wikis this way, where N is anything you want. These files are compressed, so when you process and expand the data you'll see the reason we can't do this dynamically: it's huge data and our cluster is limited.
On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
(+Analytics-l)
Hello Srdjan,
The 1k limit is a hard one: only the top 1000 articles for a given day get loaded into the database. I added the folks from the Analytics team to this thread, they may be able to help you, as they generate and expose the data in question.
Cheers, Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
On 30 March 2018 at 16:59, Srdjan Grubor srdjan@endlessm.com wrote:
Heya, I asked this on IRC but didn't get any replies so I'm following it up this way. I have a question about the newer metrics REST v1 API: is there a way to specify how many top articles to pull from https://wikimedia.org/api/rest _v1/#!/Pageviews_data/get_metrics_pageviews_top_project_acce ss_year_month_day or is 1k hardcoded? Old metrics data was available that had the most viewed pages but that disappeared with the change to the new API.
The reason I ask is because we (https://endlessos.com) are trying to rebuild our stale encyclopedia apps for offline usage but are space-limited and would only like to include the most likely pages that would be looked at that can fit within a size envelope that varies with the device in question (up to 100k article limit probably) but the new API doesn't provide us with the tools to figure out the rankings cleanly (other than rate-limiting on our side and checking every single article's metric endpoint for counts).
So the main question is: do we have a way to get this data out with the current API? If this data is not available, can the " metrics/pageviews/top" API be augmented to maybe have a `skip` and/or ` limit` params like other similar services that have this type of filtering?
Thanks,
............................................................ ..............
Srdjan Grubor | +1.314.540.8328 <(314)%20540-8328> | Endless http://endlessm.com/
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics