There is no cron job that runs at those times.  The puppet run on the dataset1001 host does run exatly at those times, so presumably something in one of the puppet jobs affects your downloads.  I see nginx worker processes running since yesterday so it's definitely not a restart, graceful or otherwise.  The puppet logs likewise do not indicate a restart or refresh of any services; in fact most of the time they indicate no changes whatsoever.

I'll look into this and track it here: https://phabricator.wikimedia.org/T142367
Please add your names to the ticket if you want to see updates as they are posted.

Ariel

On Sun, Aug 7, 2016 at 5:45 AM, gnosygnu <gnosygnu@gmail.com> wrote:
Wow! Thanks Gerhard! That's brilliant! I've been staring at it for a
while, but didn't even notice that pattern. Kudos!

I confirm the same on my side as well. I checked the XOWA logs, and
they all fail at around the 17 or 47 minute mark. I excerpt below.

I've also been trying wget today, and the failures are also at the
same minute mark. I also excerpt below.

I'm hoping this behavior is accidental, as I can't imagine that hard
interrupts would be intentional. Hopefully, Ariel or someone else will
shed more light.

----

20160805_124702.853 download failed:
src=https://dumps.wikimedia.org/commonswiki/latest/commonswiki-latest-pages-articles.xml.bz2
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160805_141654.131 download failed:
src=https://dumps.wikimedia.org/commonswiki/latest/commonswiki-latest-image.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160805_141654.131 download failed:
src=https://dumps.wikimedia.org/commonswiki/latest/commonswiki-latest-image.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160805_234711.244 download failed:
src=https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160806_051747.080 download failed:
src=https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160806_154730.251 download failed:
src=https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

20160806_154730.251 download failed:
src=https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pagelinks.sql.gz
err=[err 0] <javax.net.ssl.SSLException> SSL peer shut down
incorrectly

----

51% [=====================================================================>
                                                                 ]
9,376,136,876 1.91MB/s   in 84m 43s

2016-08-06 14:47:19 (1.76 MB/s) - Connection closed at byte
9376136876. Retrying.

--2016-08-06 14:47:20--  (try: 2)
https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
Connecting to dumps.wikimedia.org
(dumps.wikimedia.org)|208.80.154.11|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 18204308503 (17G), 8828171627 (8.2G) remaining
[application/octet-stream]
Saving to: ?commonswiki-20160801-image.sql.gz?

70% [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++========================>
                                        ] 12,776,996,137 1.96MB/s   in
29m 37s

2016-08-06 15:16:57 (1.83 MB/s) - Connection closed at byte
12776996137. Retrying.

--2016-08-06 15:16:59--  (try: 3)
https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
Connecting to dumps.wikimedia.org
(dumps.wikimedia.org)|208.80.154.11|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 18204308503 (17G), 5427312366 (5.1G) remaining
[application/octet-stream]
Saving to: ?commonswiki-20160801-image.sql.gz?

89% [++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=========================>
              ] 16,265,214,885 1.75MB/s   in 29m 55s

2016-08-06 15:46:55 (1.85 MB/s) - Connection closed at byte
16265214885. Retrying.

--2016-08-06 15:46:58--  (try: 4)
https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
Connecting to dumps.wikimedia.org
(dumps.wikimedia.org)|208.80.154.11|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 18204308503 (17G), 1939093618 (1.8G) remaining
[application/octet-stream]
Saving to: ?commonswiki-20160801-image.sql.gz?

100%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++==============>]
18,204,308,503 1.42MB/s   in 17m 6s


Saving to: ?enwiki-20160801-pages-articles.xml.bz2?

45% [============================================================>

] 5,961,055,916 1.96MB/s   in 49m 42s

2016-08-06 17:47:08 (1.91 MB/s) - Connection closed at byte
5961055916. Retrying.

--2016-08-06 17:47:09--  (try: 2)
https://dumps.wikimedia.org/enwiki/20160801/enwiki-20160801-pages-articles.xml.bz2
Connecting to dumps.wikimedia.org
(dumps.wikimedia.org)|208.80.154.11|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 13142511189 (12G), 7181455273 (6.7G) remaining
[application/octet-stream]
Saving to: ?enwiki-20160801-pages-articles.xml.bz2?

71% [+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++===================================>
                                      ] 9,419,685,161 1.90MB/s   in
29m 38s

2016-08-06 18:16:47 (1.86 MB/s) - Connection closed at byte
9419685161. Retrying.

--2016-08-06 18:16:49--  (try: 3)
https://dumps.wikimedia.org/enwiki/20160801/enwiki-20160801-pages-articles.xml.bz2
Connecting to dumps.wikimedia.org
(dumps.wikimedia.org)|208.80.154.11|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 13142511189 (12G), 3722826028 (3.5G) remaining
[application/octet-stream]

On Sat, Aug 6, 2016 at 12:32 PM, Gerhard Gonter <ggonter@gmail.com> wrote:
> I tried two files which took about 60 and 150 minutes to download.
> wget had to retry with partial data several times and completed the
> downloads after a few retries.  A pattern emerges from when you look
> at the timestamps in the typescript of the wget jobs: apparently, a
> cron job which seems to run at 17,47 minutes every hour is restarting
> the web server.
>
> GG
>
> <pre>
> $ fgrep 2016-08-06 typescript
> --2016-08-06 09:37:05--
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-pages-articles.xml.bz2
> 2016-08-06 09:47:19 (1.82 MB/s) - Connection closed at byte
> 1170013869. Retrying.
> --2016-08-06 09:47:20--  (try: 2)
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-pages-articles.xml.bz2
> 2016-08-06 10:17:29 (1.79 MB/s) - Connection closed at byte
> 4569103660. Retrying.
> --2016-08-06 10:17:31--  (try: 3)
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-pages-articles.xml.bz2
> 2016-08-06 10:27:36 (1.88 MB/s) -
> ‘commonswiki-20160801-pages-articles.xml.bz2’ saved
> [5761778656/5761778656]
> --2016-08-06 14:30:35--
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
> 2016-08-06 14:47:21 (1.92 MB/s) - Connection closed at byte
> 2025914028. Retrying.
> --2016-08-06 14:47:22--  (try: 2)
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
> 2016-08-06 15:48:01 (1.85 MB/s) - Connection closed at byte
> 9102130472. Retrying.
> --2016-08-06 15:48:03--  (try: 3)
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
> 2016-08-06 16:17:37 (1.90 MB/s) - Connection closed at byte
> 12637698981. Retrying.
> --2016-08-06 16:17:40--  (try: 4)
> https://dumps.wikimedia.org/commonswiki/20160801/commonswiki-20160801-image.sql.gz
> 2016-08-06 17:06:17 (1.82 MB/s) - ‘commonswiki-20160801-image.sql.gz’
> saved [18204308503/18204308503]
> </pre>

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l