Brion Vibber wrote:
Another power failure at the colo today. I'm not too sure of details yet as this happened in the middle of the night for me.
Admin log: https://wikitech.leuksman.com/view/Server_admin_log#April_19
PowerMedium is very very very very very very sorry and blames the equipment manufacturer; the defective equipment is apparently being replaced.
Note that the server that carries the data dump files is currently offline. If we don't have it back real soon now I'll restart them on another server.
Happened in the middle of the night for me, too, but I happened to be editing at the time. (When I'm making template changes, I try to do them at the lowest load, so they don't interfere.)
From outside, it appeared to be on the hour (within seconds)! What task are they doing that might have tripped the circuit?
Also, you seem to have lost all routing and BGP announcements (at least as seen from here), although I had no problem getting DNS (presumably outside replication). There's a single point of failure there. There should have been failover to another cluster.
While the main folks are fixing things (I assume they are very busy now), could somebody else point me at documentation about the setup?