There is no cron job that runs at those times. The puppet run on the dataset1001 host does run exatly at those times, so presumably something in one of the puppet jobs affects your downloads. I see nginx worker processes running since yesterday so it's definitely not a restart, graceful or otherwise. The puppet logs likewise do not indicate a restart or refresh of any services; in fact most of the time they indicate no changes whatsoever.
I'll look into this and track it here: https://phabricator.wikimedia.org/T142367 Please add your names to the ticket if you want to see updates as they are posted.
Ariel
On Sun, Aug 7, 2016 at 5:45 AM, gnosygnu gnosygnu@gmail.com wrote:
Wow! Thanks Gerhard! That's brilliant! I've been staring at it for a while, but didn't even notice that pattern. Kudos!
I confirm the same on my side as well. I checked the XOWA logs, and they all fail at around the 17 or 47 minute mark. I excerpt below.
I've also been trying wget today, and the failures are also at the same minute mark. I also excerpt below.
I'm hoping this behavior is accidental, as I can't imagine that hard interrupts would be intentional. Hopefully, Ariel or someone else will shed more light.
20160805_124702.853 download failed: src=https://dumps.wikimedia.org/commonswiki/latest/ commonswiki-latest-pages-articles.xml.bz2 err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160805_141654.131 download failed: src=https://dumps.wikimedia.org/commonswiki/latest/ commonswiki-latest-image.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160805_141654.131 download failed: src=https://dumps.wikimedia.org/commonswiki/latest/ commonswiki-latest-image.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160805_234711.244 download failed: src=https://dumps.wikimedia.org/enwiki/latest/enwiki- latest-pagelinks.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160806_051747.080 download failed: src=https://dumps.wikimedia.org/enwiki/latest/enwiki- latest-pagelinks.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160806_154730.251 download failed: src=https://dumps.wikimedia.org/enwiki/latest/enwiki- latest-pagelinks.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
20160806_154730.251 download failed: src=https://dumps.wikimedia.org/enwiki/latest/enwiki- latest-pagelinks.sql.gz err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down incorrectly
51% [=========================================================== ==========> ] 9,376,136,876 1.91MB/s in 84m 43s
2016-08-06 14:47:19 (1.76 MB/s) - Connection closed at byte 9376136876. Retrying.
--2016-08-06 14:47:20-- (try: 2) https://dumps.wikimedia.org/commonswiki/20160801/ commonswiki-20160801-image.sql.gz Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.11|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 18204308503 (17G), 8828171627 (8.2G) remaining [application/octet-stream] Saving to: ?commonswiki-20160801-image.sql.gz?
70% [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++========================> ] 12,776,996,137 1.96MB/s in 29m 37s
2016-08-06 15:16:57 (1.83 MB/s) - Connection closed at byte 12776996137. Retrying.
--2016-08-06 15:16:59-- (try: 3) https://dumps.wikimedia.org/commonswiki/20160801/ commonswiki-20160801-image.sql.gz Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.11|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 18204308503 (17G), 5427312366 (5.1G) remaining [application/octet-stream] Saving to: ?commonswiki-20160801-image.sql.gz?
89% [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++=========================> ] 16,265,214,885 1.75MB/s in 29m 55s
2016-08-06 15:46:55 (1.85 MB/s) - Connection closed at byte 16265214885. Retrying.
--2016-08-06 15:46:58-- (try: 4) https://dumps.wikimedia.org/commonswiki/20160801/ commonswiki-20160801-image.sql.gz Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.11|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 18204308503 (17G), 1939093618 (1.8G) remaining [application/octet-stream] Saving to: ?commonswiki-20160801-image.sql.gz?
100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++==============>] 18,204,308,503 1.42MB/s in 17m 6s
Saving to: ?enwiki-20160801-pages-articles.xml.bz2?
45% [============================================================>
] 5,961,055,916 1.96MB/s in 49m 42s
2016-08-06 17:47:08 (1.91 MB/s) - Connection closed at byte 5961055916. Retrying.
--2016-08-06 17:47:09-- (try: 2) https://dumps.wikimedia.org/enwiki/20160801/enwiki- 20160801-pages-articles.xml.bz2 Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.11|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 13142511189 (12G), 7181455273 (6.7G) remaining [application/octet-stream] Saving to: ?enwiki-20160801-pages-articles.xml.bz2?
71% [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++===================================> ] 9,419,685,161 1.90MB/s in 29m 38s
2016-08-06 18:16:47 (1.86 MB/s) - Connection closed at byte 9419685161. Retrying.
--2016-08-06 18:16:49-- (try: 3) https://dumps.wikimedia.org/enwiki/20160801/enwiki- 20160801-pages-articles.xml.bz2 Connecting to dumps.wikimedia.org (dumps.wikimedia.org)|208.80.154.11|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 13142511189 (12G), 3722826028 (3.5G) remaining [application/octet-stream]
On Sat, Aug 6, 2016 at 12:32 PM, Gerhard Gonter ggonter@gmail.com wrote:
I tried two files which took about 60 and 150 minutes to download. wget had to retry with partial data several times and completed the downloads after a few retries. A pattern emerges from when you look at the timestamps in the typescript of the wget jobs: apparently, a cron job which seems to run at 17,47 minutes every hour is restarting the web server.
GG
<pre> $ fgrep 2016-08-06 typescript --2016-08-06 09:37:05-- https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-pages-articles.xml.bz2
2016-08-06 09:47:19 (1.82 MB/s) - Connection closed at byte 1170013869. Retrying. --2016-08-06 09:47:20-- (try: 2) https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-pages-articles.xml.bz2
2016-08-06 10:17:29 (1.79 MB/s) - Connection closed at byte 4569103660. Retrying. --2016-08-06 10:17:31-- (try: 3) https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-pages-articles.xml.bz2
2016-08-06 10:27:36 (1.88 MB/s) - ‘commonswiki-20160801-pages-articles.xml.bz2’ saved [5761778656/5761778656] --2016-08-06 14:30:35-- https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-image.sql.gz
2016-08-06 14:47:21 (1.92 MB/s) - Connection closed at byte 2025914028. Retrying. --2016-08-06 14:47:22-- (try: 2) https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-image.sql.gz
2016-08-06 15:48:01 (1.85 MB/s) - Connection closed at byte 9102130472. Retrying. --2016-08-06 15:48:03-- (try: 3) https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-image.sql.gz
2016-08-06 16:17:37 (1.90 MB/s) - Connection closed at byte 12637698981. Retrying. --2016-08-06 16:17:40-- (try: 4) https://dumps.wikimedia.org/commonswiki/20160801/
commonswiki-20160801-image.sql.gz
2016-08-06 17:06:17 (1.82 MB/s) - ‘commonswiki-20160801-image.sql.gz’ saved [18204308503/18204308503] </pre>
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l