are trying to rebuild our stale encyclopedia apps for offline usage but
are space-limited and would only like to include the most likely pages that would be looked at that can fit within a size envelope >that varies with the device in question (up to 100k article limit probably) For this use case I would be careful to look at page ranks as true popularity as the top data is affected by bot spikes regularly (that is a known issue that we intend to fix). After you have your list of most popular pages please take a second look, some -but not all- of the pages that are artificially high due to bot traffic are pretty obvious (many special pages).
On Mon, Apr 2, 2018 at 8:54 AM, Leila Zia leila@wikimedia.org wrote:
On Mon, Apr 2, 2018 at 7:47 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
Hi Srdjan,
The data pipeline behind the API can't handle arbitrary skip or limit parameters, but there's a better way for the kind of question you have. We publish all the pageviews at https://dumps.wikimedia.org/ot her/pagecounts-ez/, look at the "Hourly page views per article" section. I would imagine for your use case one month of data is enough, and you can get the top N articles for all wikis this way, where N is anything you want.
One suggestion here is that if you want to find articles that are consistently high-page-view (and not part of spike/trend-views), you increase the time-window to 6 months or longer.
Best, Leila
-- Leila Zia Senior Research Scientist, Lead Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics