-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Brion Vibber wrote:
A number of Apache, memcache, and external storage boxes went down a bit ago, presumed due to a breaker flip or other power failure. As a result we've temporarily put the site to read-only, since there's intermittent failures and massive slowness.
This is now mostly resolved:
* External Storage clusters 9 and 10 are back online * down memcached boxes have been reassigned temporarily * LVS was restarted to fix a load balancing breakage
Continuing problems:
* External Storage clusters 4 and 5 are still down
They seem to be coming up now since we contacted the colo, and Rob's going in now to do a final check on things.
Problem appears to have been a (bogus?) fire alarm, which shuts down the air conditioners. This caused some machines to overheat, which triggered some breakers.
Air conditioning has been restored, and Rob is watching over things to make sure they're brought back up.
- -- brion vibber (brion @ wikimedia.org)