Earlier access to Pageviews hourly raw data files - Analytics

13 May 2022

Dear Sir or Madam,

Writing to you with a question about Pageviews hourly raw data files
<https://dumps.wikimedia.org/other/pageviews/readme.html>. First of all,
let me know if I chose the right person for a question. If not, could you
please advise to whom I should direct the question? The question is below.

I am working on a project where we would like to use Pageviews hourly data
<https://dumps.wikimedia.org/other/pageviews/readme.html>. For us, it is
crucial to get data as soon as possible. As I can see on the web page,
hourly data is available in the Wikimedia's file system approximately 45min
after the hour ends. But for an end-user, it is available several hours
later after that (this is shown on the screenshot).

Could you help us by answering the following questions:

   1. Is there any way to get data as soon as it is available on the
   Wikimedia filesystem (~45 min after the hour ends)?
   2. Are there any other faster ways to get hourly data? For instance,
   faster access to raw data files or access to *wmf.pageview_hourly

<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly>*
or
   to *wmf.pageviews_actor

<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_actor>*.
Unfortunately,
   API does not provide the opportunity to get data on an hourly level.

Best regards,

Maxim Aparovich

[image: wiki-email.png]