Hi all,
I'm currently getting connection timeouts on HTTP (at the pages listed below, without the secure part), and 404s on HTTPS on TS pages such as: https://toolserver.org/~unblock/p/appeal.php https://toolserver.org/~acc/acc.php https://toolserver.org/~snottywong/index.html https://toolserver.org/~helloannyong/range/
On SSH to willow and nightshade, my key doesn't work. No errors, it just thinks for a long time after I input my username and then may or may not request a keyboard interactive password. If I don't get asked for a password, after about 1min30sec I get a connection timeout error.
DeltaQuad English Wikipedia Administrator and Checkuser
web and ssh work for me
On Mon, Feb 25, 2013 at 8:18 PM, DeltaQuad Wikipedia deltaquadwiki@gmail.com wrote:
Hi all,
I'm currently getting connection timeouts on HTTP (at the pages listed below, without the secure part), and 404s on HTTPS on TS pages such as: https://toolserver.org/~unblock/p/appeal.php https://toolserver.org/~acc/acc.php https://toolserver.org/~snottywong/index.html https://toolserver.org/~helloannyong/range/
On SSH to willow and nightshade, my key doesn't work. No errors, it just thinks for a long time after I input my username and then may or may not request a keyboard interactive password. If I don't get asked for a password, after about 1min30sec I get a connection timeout error.
DeltaQuad English Wikipedia Administrator and Checkuser
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
They *just* came back up. Sorry for the spam all.
DeltaQuad English Wikipedia Administrator and Checkuser
On Mon, Feb 25, 2013 at 8:25 PM, John phoenixoverride@gmail.com wrote:
web and ssh work for me
On Mon, Feb 25, 2013 at 8:18 PM, DeltaQuad Wikipedia deltaquadwiki@gmail.com wrote:
Hi all,
I'm currently getting connection timeouts on HTTP (at the pages listed below, without the secure part), and 404s on HTTPS on TS pages such as: https://toolserver.org/~unblock/p/appeal.php https://toolserver.org/~acc/acc.php https://toolserver.org/~snottywong/index.html https://toolserver.org/~helloannyong/range/
On SSH to willow and nightshade, my key doesn't work. No errors, it just thinks for a long time after I input my username and then may or may not request a keyboard interactive password. If I don't get asked for a password, after about 1min30sec I get a connection timeout error.
DeltaQuad English Wikipedia Administrator and Checkuser
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
DeltaQuad wrote:
They *just* came back up. Sorry for the spam all.
It's about 9 p.m. on Monday evening right now for me. https://toolserver.org/~mzmcbride/watcher/ and other similar URLs were 404ing for me yesterday (Sunday) evening. And then they suddenly started working again without explanation. It seems to be an intermittent issue.
Maybe it's related to the start of a new UTC day and load? Or maybe it's just an intermittent issue. Probably needs to be investigated if it continues to happen, though.
MZMcBride
On Mon, 25 Feb 2013 20:58:19 -0500 MZMcBride z@mzmcbride.com wrote:
DeltaQuad wrote:
They *just* came back up. Sorry for the spam all.
It's about 9 p.m. on Monday evening right now for me. https://toolserver.org/~mzmcbride/watcher/ and other similar URLs were 404ing for me yesterday (Sunday) evening. And then they suddenly started working again without explanation. It seems to be an intermittent issue.
Maybe it's related to the start of a new UTC day and load? Or maybe it's just an intermittent issue. Probably needs to be investigated if it continues to happen, though.
While trying to load http://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
Yesterday evening up till early in the morning today, SQL queries were very slow. I did't take measurements but simple page queries that would normally execute instantly would take minutes.
Don't know if the two things are related.
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
Yesterday evening up till early in the morning today, SQL queries were very slow. I did't take measurements but simple page queries that would normally execute instantly would take minutes.
Did you try the whole night? Or which time? And which databases seemed to answer slower? The problem is that the head nodes are doing SQL forwarding too. So if the active one is fishy you might not even have SQL connections. But the phenomenon should have occured between about 0:30 and 1:30 am UTC (1:30 and 2:30 CET). If you tried outside of this timeframe it would be good to know if you had any other errors and what they looked like.
Cheers nosy
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer < marlen.caemmerer@wikimedia.de> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/**stools/tlghttp://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC. As of 11:08am UTC I cannot ssh in, and phe was getting 404s. tsbot and tsnag also left the channel at 11:05am UTC after timing out.
Cheers nosy
--Legoktm
On Tue, 26 Feb 2013 05:19:32 -0600 legoktm legoktm.wikipedia@gmail.com wrote:
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer < marlen.caemmerer@wikimedia.de> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/**stools/tlghttp://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC. As of 11:08am UTC I cannot ssh in, and phe was getting 404s. tsbot and tsnag also left the channel at 11:05am UTC after timing out.
Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so processes.
Some lines from strace when it was hanging:
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0 poll([{fd=6, events=POLLOUT}], 1, 0) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN}], 1, 4999) = 0 (Timeout)
Port 53 is DNS? So it looks like some DNS query timed out?
Now it seems to be working again. I didn't log the whole strace run, but I saved the lines that I still had in the terminal buffer... I can send it if anybody needs it.
On Tue, 26 Feb 2013 12:54:27 +0100 Johannes Kroll jkroll@lavabit.com wrote:
On Tue, 26 Feb 2013 05:19:32 -0600 legoktm legoktm.wikipedia@gmail.com wrote:
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer < marlen.caemmerer@wikimedia.de> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:
While trying to load http://toolserver.org/~render/**stools/tlghttp://toolserver.org/~render/stools/tlg, we got 500 errors first and then "connection reset". SSH to nightshade took 2 minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC. As of 11:08am UTC I cannot ssh in, and phe was getting 404s. tsbot and tsnag also left the channel at 11:05am UTC after timing out.
Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so processes.
Some lines from strace when it was hanging:
connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0 poll([{fd=6, events=POLLOUT}], 1, 0) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}]) sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41 poll([{fd=6, events=POLLIN}], 1, 4999) = 0 (Timeout)
Port 53 is DNS? So it looks like some DNS query timed out?
If DNS drops out from time to time, could that explain the problems we see? Even rsync failed for me at one point, in addition to the web and ssh stuff.
Which machine has address 10.24.1.18? Why would it be down or unreachable?
toolserver-l@lists.wikimedia.org