Hi,
Since about an hour the web servers appear to be unresponsive:
* http://ortelius.toolserver.org/~cvn/index.html * http://wolfsbane.toolserver.org/~cvn/index.html * https://toolserver.org/~cvn/index.html
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
-- Krinkle
On Sun, Sep 23, 2012 at 06:17:37AM +0200, Krinkle wrote:
Since about an hour the web servers appear to be unresponsive:
- http://ortelius.toolserver.org/~cvn/index.html
- http://wolfsbane.toolserver.org/~cvn/index.html
- https://toolserver.org/~cvn/index.html
All error out on with no response and a time out.
toolserver (at least) is still not responsive.
BTW, I noticed a lag between deploying and availability of ~ayacop/public_html of about 15 min already yesterday (around 2012-Sep-23 09:30 GMT).
ralf
Hello,
At Sunday 23 September 2012 20:30:29 DaB. wrote:
Since about an hour the web servers appear to be unresponsive:
- http://ortelius.toolserver.org/~cvn/index.html
- http://wolfsbane.toolserver.org/~cvn/index.html
- https://toolserver.org/~cvn/index.html
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
I will now investigate this. Until now the only problem I found is that hemlock is down.
Sincerely, DaB.
Hello, At Sunday 23 September 2012 21:01:27 DaB. wrote:
Hello,
At Sunday 23 September 2012 20:30:29 DaB. wrote:
Since about an hour the web servers appear to be unresponsive:
- http://ortelius.toolserver.org/~cvn/index.html
- http://wolfsbane.toolserver.org/~cvn/index.html
- https://toolserver.org/~cvn/index.html
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
I will now investigate this. Until now the only problem I found is that hemlock is down.
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS. What is not working at the moment is the user-store and our backup, because both are on the external array of hemlock. Also not working is munin, which is handled by hemlock. I will try to fix all this, but I guess I need nosy for that (and in the worst case Mark in the colo).
Sincerely, DaB.
Sincerely, DaB.
On 23/09/12 21:07, DaB. wrote:
Hello,
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
I will now investigate this. Until now the only problem I found is that hemlock is down.
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS.
Maybe they got blocked trying to access something at /mnt/user-store ? Following a symlink, perhaps. The last successful answer by both webservers was at 23/Sep/2012:02:35 before the restart.
This NFS failure was quite bad since there was no timeout when trying to access /mnt/user-store, just being blocked forever. I guess TS-1519 and TS-1520 may be closed now.
Thanks
I'm still getting 504 Gateway Timeout errors when trying to access the following pages:
http://toolserver.org/~hersfold/ http://toolserver.org/~unblock/ http://toolserver.org/~nakon/ http://toolserver.org/~acc/
Please note that two of these pages, ~unblock and ~acc, are used in the block appeals process on the English Wikipedia. The repeated downtime of these sites is causing significant disruption there.
---- User:Hersfold hersfoldwiki@gmail.com
On 9/23/2012 4:01 PM, Platonides wrote:
On 23/09/12 21:07, DaB. wrote:
Hello,
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
I will now investigate this. Until now the only problem I found is that hemlock is down.
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS.
Maybe they got blocked trying to access something at /mnt/user-store ? Following a symlink, perhaps. The last successful answer by both webservers was at 23/Sep/2012:02:35 before the restart.
This NFS failure was quite bad since there was no timeout when trying to access /mnt/user-store, just being blocked forever. I guess TS-1519 and TS-1520 may be closed now.
Thanks
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Never mind, I'm able to get in now.
---- User:Hersfold hersfoldwiki@gmail.com
On 9/23/2012 4:19 PM, Hersfold wrote:
I'm still getting 504 Gateway Timeout errors when trying to access the following pages:
http://toolserver.org/~hersfold/ http://toolserver.org/~unblock/ http://toolserver.org/~nakon/ http://toolserver.org/~acc/
Please note that two of these pages, ~unblock and ~acc, are used in the block appeals process on the English Wikipedia. The repeated downtime of these sites is causing significant disruption there.
User:Hersfold hersfoldwiki@gmail.com
On 9/23/2012 4:01 PM, Platonides wrote:
On 23/09/12 21:07, DaB. wrote:
Hello,
All error out on with no response and a time out.
I can still SSH into wolfsbane and ortelius from willow, though.
I will now investigate this. Until now the only problem I found is that hemlock is down.
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS.
Maybe they got blocked trying to access something at /mnt/user-store ? Following a symlink, perhaps. The last successful answer by both webservers was at 23/Sep/2012:02:35 before the restart.
This NFS failure was quite bad since there was no timeout when trying to access /mnt/user-store, just being blocked forever. I guess TS-1519 and TS-1520 may be closed now.
Thanks
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello again, At Sunday 23 September 2012 23:47:18 DaB. wrote:
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS. What is not working at the moment is the user-store and our backup, because both are on the external array of hemlock. Also not working is munin, which is handled by hemlock. I will try to fix all this, but I guess I need nosy for that (and in the worst case Mark in the colo).
just an update: Nosy came online and fixed the partitions so they were recognized again by hemlock. Than we rebooted hemlock to check if everything was working again, and that failed. So we fixed it again (what needed another reboot) and manually fixed the non-working stuff. So in short: Everything should work again now (as long as hemlock will not be rebooted). I will try to investigate why Zeus (our webserver-program) was not working without hemlock in the next day. We also will have a maintenance- window later this week (will tell details in another mail) to fix hemlock properly.
Sincerely, DaB.
I'm having issues while running scripts.
oursql.ProgrammingError: (1226, "User 'emijrp' has exceeded the 'max_user_connections' resource (current value: 15)", None)
I don't see any other script in my top. Perhaps zombie queries from the last hours issues?
2012/9/23 DaB. WP@daniel.baur4.info
Hello again, At Sunday 23 September 2012 23:47:18 DaB. wrote:
I restored the web-access now. As far as I see hemlock lost its external array and became out of memory around 2:30 UTC. I have no idea why this influence our webserver. I rebooted hemlock to free the memory and restarted the webserver on ortelius and wolfsbane; the webpages are back AFAIS. What is not working at the moment is the user-store and our backup, because both are on the external array of hemlock. Also not working is munin, which is handled by hemlock. I will try to fix all this, but I guess I need nosy for that (and in the worst case Mark in the colo).
just an update: Nosy came online and fixed the partitions so they were recognized again by hemlock. Than we rebooted hemlock to check if everything was working again, and that failed. So we fixed it again (what needed another reboot) and manually fixed the non-working stuff. So in short: Everything should work again now (as long as hemlock will not be rebooted). I will try to investigate why Zeus (our webserver-program) was not working without hemlock in the next day. We also will have a maintenance- window later this week (will tell details in another mail) to fix hemlock properly.
Sincerely, DaB.
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, At Monday 24 September 2012 14:28:26 DaB. wrote:
oursql.ProgrammingError: (1226, "User 'emijrp' has exceeded the 'max_user_connections' resource (current value: 15)", None)
it is way more helpful when you tell me the cluster ;-).
AFAIS you have 15 long-running-tasks at rosemary at the moment.
Sincerely, DaB.
Can you kill them?
I always connect to login.toolserver.org. How do I change to rosemary?
2012/9/24 DaB. WP@daniel.baur4.info
Hello, At Monday 24 September 2012 14:28:26 DaB. wrote:
oursql.ProgrammingError: (1226, "User 'emijrp' has exceeded the 'max_user_connections' resource (current value: 15)", None)
it is way more helpful when you tell me the cluster ;-).
AFAIS you have 15 long-running-tasks at rosemary at the moment.
Sincerely, DaB.
-- Userpage: [[:w:de:User:DaB.]] — PGP: 2B255885
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 24/09/12 15:03, emijrp wrote:
Can you kill them?
I always connect to login.toolserver.org http://login.toolserver.org. How do I change to rosemary?
rosemary is a sql server.
You'd do something like this:
$ mysql -h rosemary
mysql> show full processlist; +-----------+------------+-----------------------------+------+---------+------+-------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----------+------------+-----------------------------+------+---------+------+-------+-----------------------+ | 142886093 | platonides | willow.toolserver.org:62886 | NULL | Query | 0 | NULL | show full processlist | +-----------+------------+-----------------------------+------+---------+------+-------+-----------------------+ 1 row in set (0.00 sec)
mysql> kill 142886093;
Of course, you would have some problems doing it when you can't even connect to the server :)
I had the same problem on Saturday with s2-rr, even after killing all php.fcgi instances on the webservers (the ones from which those 15 connections could have originated), the server wasn't allowing me to connect.
While we're blaming the sql servers, I had a script failing this night with «Lost connection to MySQL server at 'reading initial communication packet', system error: 0 (sql-s3-rr.toolserver.org)» but it is working to now.
Regards
Hello, At Monday 24 September 2012 17:06:54 DaB. wrote:
Can you kill them?
The Query-Killer did already.
I always connect to login.toolserver.org. How do I change to rosemary?
You can't connect to rosemary with SSH; but you don't need. You can either use mysql (see Platonides' mail) or mytop (which works better for this in my eyes).
Sincerely, DaB.
toolserver-l@lists.wikimedia.org