On Tue, 26 Feb 2013 12:54:27 +0100 Johannes Kroll jkroll@lavabit.com wrote:
On Tue, 26 Feb 2013 05:19:32 -0600 legoktm legoktm.wikipedia@gmail.com wrote:
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer < marlen.caemmerer@wikimedia.de> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/**stools/tlghttp://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC. As of 11:08am UTC I cannot ssh in, and phe was getting 404s. tsbot and tsnag also left the channel at 11:05am UTC after timing out.
Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so processes.
Some lines from strace when it was hanging:
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0 poll([{fd=6, events=POLLOUT}], 1, 0) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN}], 1, 4999) = 0 (Timeout)
Port 53 is DNS? So it looks like some DNS query timed out?
If DNS drops out from time to time, could that explain the problems we see? Even rsync failed for me at one point, in addition to the web and ssh stuff.
Which machine has address 10.24.1.18? Why would it be down or unreachable?