Raises an interesting question, what made the server go down in the first place? Surely the power to the server would be 1+1? (IE: Dual Powers supplies attached to separate power circuits, powered by separate UPS and Generator grids respectively)
This kind of redundancy is expected in data centers now days and I assume that all the TS servers are in a data center. Just a curious question as to why this obviously isn't the case.
-Brett
-----Original Message----- From: toolserver-l-bounces@lists.wikimedia.org [mailto:toolserver-l-bounces@lists.wikimedia.org] On Behalf Of River Tarnell Sent: Wednesday, 30 June 2010 9:24 AM To: Wikimedia Toolserver Announcements Cc: Wikimedia Toolserver Discussion Subject: [Toolserver-l] Power outage
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
At about 22:30 UTC last night (Tuesday) one of our power circuits went down for about 15 minutes. This affected one node of the HA cluster which was hosting the following services:
Sun Grid Engine master server tsbot IRC bot DNS recursor MySQL server for sql-toolserver MySQL replication support infrastructure LDAP server
All services failed over to the other node and were online again within 22 seconds. However, MySQL did not respond well to losing its replication connection and had to be restarted manually, causing about 30 minutes replication lag.
- river.
_______________________________________________ Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette