Hi WikiMedia Analytics,
I'm a student who has been doing work with the page count files from wikimedia.
During the last few days, it looks like the latest page count is being published slower than before.
Usually, when I go to the following link:
http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/.
I could see what happened an hour ago sometime within an hour or so after that.
Is this property still going to be true? This seems to not be the case for 9/16 and 9/17.
Also, pagecounts-20150916-090000.gz http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/pagecounts-20150916-090000.gz does not seem to be of correct size.
Thanks,
Tony Ho
Hi Tony,
Indeed! We noticed this on Friday too. A couple of changes[1] were recently made to improve a few things with our webrequest processing, and it seems things have greatly slowed since then. We’ll talk about this problem for the first time today. I’m not sure when things will get better, but I believe they eventually will.
[1] https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efb... https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efbad333f83d7f0c47759faf8c488
Sorry for the delay! -Andrew
On Sep 18, 2015, at 04:23, Tony Ho tonyho1992@gmail.com wrote:
Hi WikiMedia Analytics,
I'm a student who has been doing work with the page count files from wikimedia.
During the last few days, it looks like the latest page count is being published slower than before.
Usually, when I go to the following link:
http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/ http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/.
I could see what happened an hour ago sometime within an hour or so after that.
Is this property still going to be true? This seems to not be the case for 9/16 and 9/17.
Also, pagecounts-20150916-090000.gz http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/pagecounts-20150916-090000.gz does not seem to be of correct size.
Thanks,
Tony Ho _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
thanks andrew!
Out of curiosity, do you guys try to backfill when data is missing as well? I just saw that 9/27 is missing quite a few files: 0,7,10,17,21,22,23.
On Mon, Sep 21, 2015 at 6:25 AM, Andrew Otto aotto@wikimedia.org wrote:
Hi Tony,
Indeed! We noticed this on Friday too. A couple of changes[1] were recently made to improve a few things with our webrequest processing, and it seems things have greatly slowed since then. We’ll talk about this problem for the first time today. I’m not sure when things will get better, but I believe they eventually will.
[1] https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efb...
Sorry for the delay! -Andrew
On Sep 18, 2015, at 04:23, Tony Ho tonyho1992@gmail.com wrote:
Hi WikiMedia Analytics,
I'm a student who has been doing work with the page count files from wikimedia.
During the last few days, it looks like the latest page count is being published slower than before.
Usually, when I go to the following link:
http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/.
I could see what happened an hour ago sometime within an hour or so after that.
Is this property still going to be true? This seems to not be the case for 9/16 and 9/17.
Also, pagecounts-20150916-090000.gz http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/pagecounts-20150916-090000.gz does not seem to be of correct size.
Thanks,
Tony Ho _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yup! The hourly jobs that generate these are individually tracked. We usually notice if there are problems during a working day. Pinging us here on this list is a good way to bump us into faster action too.
I just looked now, and those files do exist.
On Sep 28, 2015, at 00:22, Tony Ho tonyho1992@gmail.com wrote:
thanks andrew!
Out of curiosity, do you guys try to backfill when data is missing as well? I just saw that 9/27 is missing quite a few files: 0,7,10,17,21,22,23.
On Mon, Sep 21, 2015 at 6:25 AM, Andrew Otto <aotto@wikimedia.org mailto:aotto@wikimedia.org> wrote: Hi Tony,
Indeed! We noticed this on Friday too. A couple of changes[1] were recently made to improve a few things with our webrequest processing, and it seems things have greatly slowed since then. We’ll talk about this problem for the first time today. I’m not sure when things will get better, but I believe they eventually will.
[1] https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efb... https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efbad333f83d7f0c47759faf8c488
Sorry for the delay! -Andrew
On Sep 18, 2015, at 04:23, Tony Ho <tonyho1992@gmail.com mailto:tonyho1992@gmail.com> wrote:
Hi WikiMedia Analytics,
I'm a student who has been doing work with the page count files from wikimedia.
During the last few days, it looks like the latest page count is being published slower than before.
Usually, when I go to the following link:
http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/ http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/.
I could see what happened an hour ago sometime within an hour or so after that.
Is this property still going to be true? This seems to not be the case for 9/16 and 9/17.
Also, pagecounts-20150916-090000.gz http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/pagecounts-20150916-090000.gz does not seem to be of correct size.
Thanks,
Tony Ho _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
looks like 7 and 21 are missing still?
On Mon, Sep 28, 2015 at 11:40 AM, Andrew Otto aotto@wikimedia.org wrote:
Yup! The hourly jobs that generate these are individually tracked. We usually notice if there are problems during a working day. Pinging us here on this list is a good way to bump us into faster action too.
I just looked now, and those files do exist.
On Sep 28, 2015, at 00:22, Tony Ho tonyho1992@gmail.com wrote:
thanks andrew!
Out of curiosity, do you guys try to backfill when data is missing as well? I just saw that 9/27 is missing quite a few files: 0,7,10,17,21,22,23.
On Mon, Sep 21, 2015 at 6:25 AM, Andrew Otto aotto@wikimedia.org wrote:
Hi Tony,
Indeed! We noticed this on Friday too. A couple of changes[1] were recently made to improve a few things with our webrequest processing, and it seems things have greatly slowed since then. We’ll talk about this problem for the first time today. I’m not sure when things will get better, but I believe they eventually will.
[1] https://github.com/wikimedia/analytics-refinery-source/commit/fd59a13c4a6efb...
Sorry for the delay! -Andrew
On Sep 18, 2015, at 04:23, Tony Ho tonyho1992@gmail.com wrote:
Hi WikiMedia Analytics,
I'm a student who has been doing work with the page count files from wikimedia.
During the last few days, it looks like the latest page count is being published slower than before.
Usually, when I go to the following link:
http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/.
I could see what happened an hour ago sometime within an hour or so after that.
Is this property still going to be true? This seems to not be the case for 9/16 and 9/17.
Also, pagecounts-20150916-090000.gz http://dumps.wikimedia.org/other/pagecounts-all-sites/2015/2015-09/pagecounts-20150916-090000.gz does not seem to be of correct size.
Thanks,
Tony Ho _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics