Hi Dan / Marco,
Let me take a look and see if this will be good enough but it looks
promising! If you don't hear from me again, all is well :)
Thanks!
..........................................................................
Srdjan Grubor | +1.314.540.8328 | Endless <http://endlessm.com/>
On Mon, Apr 2, 2018 at 9:47 AM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Hi Srdjan,
The data pipeline behind the API can't handle arbitrary skip or limit
parameters, but there's a better way for the kind of question you have. We
publish all the pageviews at
https://dumps.wikimedia.org/
other/pagecounts-ez/, look at the "Hourly page views per article"
section. I would imagine for your use case one month of data is enough,
and you can get the top N articles for all wikis this way, where N is
anything you want. These files are compressed, so when you process and
expand the data you'll see the reason we can't do this dynamically: it's
huge data and our cluster is limited.
On Sun, Apr 1, 2018 at 11:51 AM, Marko Obrovac <mobrovac(a)wikimedia.org>
wrote:
(+Analytics-l)
Hello Srdjan,
The 1k limit is a hard one: only the top 1000 articles for a given day
get loaded into the database. I added the folks from the Analytics team to
this thread, they may be able to help you, as they generate and expose the
data in question.
Cheers,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
On 30 March 2018 at 16:59, Srdjan Grubor <srdjan(a)endlessm.com> wrote:
Heya,
I asked this on IRC but didn't get any replies so I'm following it up
this way.
I have a question about the newer metrics REST v1 API: is there a way to
specify how many top articles to pull from
https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metr
ics_pageviews_top_project_access_year_month_day or is 1k hardcoded? Old
metrics data was available that had the most viewed pages but that
disappeared with the change to the new API.
The reason I ask is because we (
https://endlessos.com) are trying to
rebuild our stale encyclopedia apps for offline usage but are space-limited
and would only like to include the most likely pages that would be looked
at that can fit within a size envelope that varies with the device in
question (up to 100k article limit probably) but the new API doesn't
provide us with the tools to figure out the rankings cleanly (other than
rate-limiting on our side and checking every single article's metric
endpoint for counts).
So the main question is: do we have a way to get this data out with the
current API? If this data is not available, can the "
metrics/pageviews/top" API be augmented to maybe have a `skip` and/or `
limit` params like other similar services that have this type of
filtering?
Thanks,
............................................................
..............
Srdjan Grubor | +1.314.540.8328 <(314)%20540-8328> | Endless
<http://endlessm.com/>
_______________________________________________
Services mailing list
Services(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/services
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics