Hi Srdjan,

The data pipeline behind the API can't handle arbitrary skip or limit parameters, but there's a better way for the kind of question you have.  We publish all the pageviews at https://dumps.wikimedia.org/other/pagecounts-ez/, look at the "Hourly page views per article" section.  I would imagine for your use case one month of data is enough, and you can get the top N articles for all wikis this way, where N is anything you want.  These files are compressed, so when you process and expand the data you'll see the reason we can't do this dynamically: it's huge data and our cluster is limited.

On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac <mobrovac@wikimedia.org> wrote:
(+Analytics-l)

Hello Srdjan,

The 1k limit is a hard one: only the top 1000 articles for a given day get loaded into the database. I added the folks from the Analytics team to this thread, they may be able to help you, as they generate and expose the data in question.


Cheers,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation


On 30 March 2018 at 16:59, Srdjan Grubor <srdjan@endlessm.com> wrote:
Heya,
I asked this on IRC but didn't get any replies so I'm following it up this way.
I have a question about the newer metrics REST v1 API: is there a way to specify how many top articles to pull from https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_top_project_access_year_month_day or is 1k hardcoded? Old metrics data was available that had the most viewed pages but that disappeared with the change to the new API. 

The reason I ask is because we (https://endlessos.com) are trying to rebuild our stale encyclopedia apps for offline usage but are space-limited and would only like to include the most likely pages that would be looked at that can fit within a size envelope that varies with the device in question (up to 100k article limit probably) but the new API doesn't provide us with the tools to figure out the rankings cleanly (other than rate-limiting on our side and checking every single article's metric endpoint for counts). 

So the main question is: do we have a way to get this data out with the current API? If this data is not available, can the "metrics/pageviews/top" API be augmented to maybe have a `skip` and/or `limit` params like other similar services that have this type of filtering?

Thanks,

..........................................................................


Srdjan Grubor  |  +1.314.540.8328  |  Endless


_______________________________________________
Services mailing list
Services@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/services



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics