-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi,
the Toolserver is currently offline due to an issue with the NFS server. we are working on the issue and have a ticket open with Sun, but at this point there is no ETA.
if it looks like the issue will take some time to resolve, we will restore the last /home backup onto another server and use that. however, that means losing about 12 hours worth of changes, so we will try to avoid it if possible.
- river.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Il giorno 24/ago/09, alle ore 17:31, River Tarnell ha scritto:
the Toolserver is currently offline due to an issue with the NFS server. we are working on the issue and have a ticket open with Sun, but at this point there is no ETA.
Sounds very bad. :-( Is the filesystem blocked both read and write?
Can you please add a notice to the "404" pages, so that our tools' users know what's happening? The webserver can't serve pages so the "DaB's notice" can't be displayed...
if it looks like the issue will take some time to resolve, we will restore the last /home backup onto another server and use that. however, that means losing about 12 hours worth of changes, so we will try to avoid it if possible.
Thank you for your work, as always. Regards,
Pietrodn powerpdn@gmail.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
River Tarnell:
the Toolserver is currently offline due to an issue with the NFS server. we are working on the issue and have a ticket open with Sun, but at this point there is no ETA.
update: hyacinth is now online without /home. this means stable is up, but the rest of the Toolserver is still offline.
at this point we understand the issue but not the cause. i have two separate plans to restore service; preferably by recovering /home, but if that is not possible within the next few hours, i will rebuild the server and restore from the last backup, dated 2009-08-24 at 5:00AM UTC.
once service is restored i will provide a post-mortem report on the issue.
- river.
When toolserver going on?
On 8/24/09, River Tarnell river@loreley.flyingparchment.org.uk wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
River Tarnell:
the Toolserver is currently offline due to an issue with the NFS server. we are working on the issue and have a ticket open with Sun, but at this point there is no ETA.
update: hyacinth is now online without /home. this means stable is up, but the rest of the Toolserver is still offline.
at this point we understand the issue but not the cause. i have two separate plans to restore service; preferably by recovering /home, but if that is not possible within the next few hours, i will rebuild the server and restore from the last backup, dated 2009-08-24 at 5:00AM UTC.
once service is restored i will provide a post-mortem report on the issue.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (HP-UX)
iEYEARECAAYFAkqS0uwACgkQIXd7fCuc5vJqGQCfa0HGhDuB0P6W6UhCmHP39y3l tyYAn2v5WGMcTwVYZxvBbV4Drb9s02t+ =TtTc -----END PGP SIGNATURE-----
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
after further investigation we have determined that the best course of action is to restore from a backup. i have imported the last /home backup (2009-08-24 05:00 UTC) to hemlock, and temporarily mounted this as /home, so the Toolserver is now up.
i will rebuild the NFS server now (which will take a couple of hours at least) and then start migrating data back to it. there will be some further downtime while this is done.
i will be making some changes to reduce issues like this in the future, which i will document later.
- river.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi,
i've finished repairing hyacinth (the NFS server) and i'm about to start copying /home back. this will take an hour or two during which the Toolserver will be offline again. assuming there are no problems, this should be the last downtime.
- river.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi,
the maintenance is now mostly finished and the Toolserver is back up. i'm now working on bringing stable back.
- river.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi,
maintenance on stable is now finished and it should be up. the system was reinstalled, so if you notice anything odd (like missing configuration or software) please let me know.
- river.
Thanks River.
I think that this isn't the correct place for this request, but since some time ago this tool[1] doesn't find commonswiki edits. Perhaps it is due to the recent commonswiki database movement.
Regards
[1] http://toolserver.org/~vvv/sulutil.php?user=Emijrp
2009/8/25 River Tarnell river@loreley.flyingparchment.org.uk
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
hi,
maintenance on stable is now finished and it should be up. the system was reinstalled, so if you notice anything odd (like missing configuration or software) please let me know.
- river.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (HP-UX)
iEYEARECAAYFAkqULeQACgkQIXd7fCuc5vIVgACeJicqczYen79m1VTrF31uNDh1 pqcAoL1Bwqib9O2Os2bhDLY4u4FDNvst =1LjX -----END PGP SIGNATURE-----
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Hello, Am Dienstag 25 August 2009 22:52:47 schrieb emijrp:
I think that this isn't the correct place for this request,
please contact the author of the tool (it's vvv). See [1].
Sincerly, DaB.
[1] https://wiki.toolserver.org/view/If_you_found_a_bug
the maintenance is now mostly finished and the Toolserver is back up. i'm now working on bringing stable back.
could you please take a look at cassini? I'm unable to login and also [1] seems to be not available. Both may be a problem with the /home server, i think.
Thank you!
Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Peter Körner:
could you please take a look at cassini? I'm unable to login and also [1] seems to be not available.
login should work now, but there was no footnote in your mail so i don't know what [1] refers to.
- river.
Peter Körner:
could you please take a look at cassini? I'm unable to login and also [1] seems to be not available.
login should work now, but there was no footnote in your mail so i don't know what [1] refers to.
Oh., yay, that's like forgetting an attachment on a mail :)
It should have been referring to http://cassini.toolserver.org/~mazder/ which is up again now, too.
Thank you! Peter
hi river,
i'm sorry to hear that you got so much trouble with the servers, the replication and all those other things in the past few weeks. I can imagine how annoying this is. If i'd live near you, i'll send you some beer/wine/cola/coffee or what ever you'd prefer, to keep you up.
But unfortunately it's a long way from germany so I'm just able to pat you on the back.
Thank you! Peter
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
for those who don't read journal.ts.o, a write-up of the outage is available at
https://confluence.toolserver.org/display/tech/Platform+outage+2009-08-24
- river.
Hullo River, Thanks for bringing the server back.
I could not access https://phpmyadmin.toolserver.org/index.php though. Would be great if you could fix it.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Shaurabh Bharti:
I could not access https://phpmyadmin.toolserver.org/index.php though. Would be great if you could fix it.
this is fixed now, but in future please use JIRA (https://jira.toolserver.org) to report issues.
- river.
River Tarnell wrote:
for those who don't read journal.ts.o, a write-up of the outage is available at
https://confluence.toolserver.org/display/tech/Platform+outage+2009-08-24
- river.
This issue also highlighted the problems of having the NFS server as a single point of failure. We already have a plan to make NFS redundant, but this will need to
wait until next year until
funds are available to implement it.
How are you planning to make NFS redundant?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides:
How are you planning to make NFS redundant?
we plan to use a disk array for storage and two head nodes to provide NFS access. the array has two controllers which provide redundant service to the host (active/passive). each host is connected to both controllers (4 cables in total), redundantly to each other, and redundantly to the network.
each head will run Solaris with Sun Cluster HA-NFS software to provide active/passive NFS service to clients with transparent failover. this allows us to perform maintenance on the array and either NFS head without affecting service, and in case or hardware or software failure of the active head, the system will failover to the passive head.
- river.
toolserver-l@lists.wikimedia.org