Hi,

If all is well, new datasets should appear every hour, and lag behind the current time by no more than 3 hours.  However, this data is generated on an analytics cluster that is used by other production jobs and by researchers, so when things are busy or there have been hiccups for one reason or another, it could take longer for the jobs that generate this data to catch up.

-Andrew Otto



On Jun 1, 2015, at 11:06, Vadim Bichutskiy <vadim@echeloninsights.com> wrote:

Hello,

I'd like to set up an ETL process to get your pagecounts*.gz files and load it into our system. I understand you provide hourly data. Is there a specific schedule when new files are uploaded to the site (http://dumps.wikimedia.org/other/pagecounts-raw/2015/) ? I'd like to get new data as soon as possible.

Thanks,
Vadim

--
Vadim Y. Bichutskiy
@vybstat
Lead Data Scientist
Echelon Insights
vadim@echeloninsights.com
(408) 439-5932
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics