On Tue, 26 Feb 2013 05:19:32 -0600 legoktm legoktm.wikipedia@gmail.com wrote:
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer < marlen.caemmerer@wikimedia.de> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/**stools/tlghttp://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC. As of 11:08am UTC I cannot ssh in, and phe was getting 404s. tsbot and tsnag also left the channel at 11:05am UTC after timing out.
Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so processes.
Some lines from strace when it was hanging:
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0 poll([{fd=6, events=POLLOUT}], 1, 0) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN}], 1, 4999) = 0 (Timeout)
Port 53 is DNS? So it looks like some DNS query timed out?
Now it seems to be working again. I didn't log the whole strace run, but I saved the lines that I still had in the terminal buffer... I can send it if anybody needs it.