Hello,
I'd like to set up an ETL process to get your pagecounts*.gz files and load it into our system. I understand you provide hourly data. Is there a specific schedule when new files are uploaded to the site ( http://dumps.wikimedia.org/other/pagecounts-raw/2015/) ? I'd like to get new data as soon as possible.
Thanks, Vadim
-- Vadim Y. Bichutskiy @vybstat Lead Data Scientist Echelon Insights vadim@echeloninsights.com (408) 439-5932 ᐧ
Hi,
If all is well, new datasets should appear every hour, and lag behind the current time by no more than 3 hours. However, this data is generated on an analytics cluster that is used by other production jobs and by researchers, so when things are busy or there have been hiccups for one reason or another, it could take longer for the jobs that generate this data to catch up.
-Andrew Otto
On Jun 1, 2015, at 11:06, Vadim Bichutskiy vadim@echeloninsights.com wrote:
Hello,
I'd like to set up an ETL process to get your pagecounts*.gz files and load it into our system. I understand you provide hourly data. Is there a specific schedule when new files are uploaded to the site (http://dumps.wikimedia.org/other/pagecounts-raw/2015/ http://dumps.wikimedia.org/other/pagecounts-raw/2015/) ? I'd like to get new data as soon as possible.
Thanks, Vadim
-- Vadim Y. Bichutskiy @vybstat Lead Data Scientist Echelon Insights vadim@echeloninsights.com mailto:vadim@echeloninsights.com (408) 439-5932 ᐧ _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics