Hi,
If all is well, new datasets should appear every hour, and lag behind the current time by
no more than 3 hours. However, this data is generated on an analytics cluster that is
used by other production jobs and by researchers, so when things are busy or there have
been hiccups for one reason or another, it could take longer for the jobs that generate
this data to catch up.
-Andrew Otto
On Jun 1, 2015, at 11:06, Vadim Bichutskiy
<vadim(a)echeloninsights.com> wrote:
Hello,
I'd like to set up an ETL process to get your pagecounts*.gz files and load it into
our system. I understand you provide hourly data. Is there a specific schedule when new
files are uploaded to the site (
http://dumps.wikimedia.org/other/pagecounts-raw/2015/
<http://dumps.wikimedia.org/other/pagecounts-raw/2015/>) ? I'd like to get new
data as soon as possible.
Thanks,
Vadim
--
Vadim Y. Bichutskiy
@vybstat
Lead Data Scientist
Echelon Insights
vadim(a)echeloninsights.com <mailto:vadim@echeloninsights.com>
(408) 439-5932
ᐧ
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics