[Labs-announce] Possible reboots and/or outages -- please read
Andrew Bogott
abogott at wikimedia.org
Fri May 20 15:10:45 UTC 2016
Note: Tools users can ignore this message
We are seeing some unusual behavior on labvirt1003, which hosts a
large number of labs instances. The problem is not yet diagnosed, but
it is likely a hardware problem that will require reboots or downtime.
Here is a complete list of labs instances currently living on labvirt1003:
https://phabricator.wikimedia.org/P3159
If you have any hosts on that box that cannot survive a reboot,
please either let me know, or take steps to minimize the damage. I've
removed labvirt1003 from the scheduler, so if you want to build a new
instance and migrate services to it you can be assured that the new
instance will be isolated from the coming chaos.
A simple reboot shouldn't produce more than 5-10 minutes of
downtime. If a major outage seems likely, I'll follow up with
additional warning.
-Andrew
More information about the Labs-announce
mailing list