[Labs-announce] Possible reboots and/or outages -- please read

Andrew Bogott abogott at wikimedia.org
Fri May 20 15:10:45 UTC 2016


Note:  Tools users can ignore this message

     We are seeing some unusual behavior on labvirt1003, which hosts a 
large number of labs instances.  The problem is not yet diagnosed, but 
it is likely a hardware problem that will require reboots or downtime.  
Here is a complete list of labs instances currently living on labvirt1003:

https://phabricator.wikimedia.org/P3159

     If you have any hosts on that box that cannot survive a reboot, 
please either let me know, or take steps to minimize the damage.  I've 
removed labvirt1003 from the scheduler, so if you want to build a new 
instance and migrate services to it you can be assured that the new 
instance will be isolated from the coming chaos.

     A simple reboot shouldn't produce more than 5-10 minutes of 
downtime.  If a major outage seems likely, I'll follow up with 
additional warning.

-Andrew




More information about the Labs-announce mailing list