[Labs-announce] Possible reboots and/or outages -- please read

Andrew Bogott abogott at wikimedia.org
Mon May 23 17:34:00 UTC 2016

I wound up needing to reboot labvirt1003.  It's up now and seems happy; 
I'm currently in the process of restarting all associated instances.  
Everything should be up and running within the hour... let me know if 
you still see issues later in the day.


On 5/20/16 10:10 AM, Andrew Bogott wrote:
> Note:  Tools users can ignore this message
>     We are seeing some unusual behavior on labvirt1003, which hosts a 
> large number of labs instances.  The problem is not yet diagnosed, but 
> it is likely a hardware problem that will require reboots or 
> downtime.  Here is a complete list of labs instances currently living 
> on labvirt1003:
> https://phabricator.wikimedia.org/P3159
>     If you have any hosts on that box that cannot survive a reboot, 
> please either let me know, or take steps to minimize the damage.  I've 
> removed labvirt1003 from the scheduler, so if you want to build a new 
> instance and migrate services to it you can be assured that the new 
> instance will be isolated from the coming chaos.
>     A simple reboot shouldn't produce more than 5-10 minutes of 
> downtime.  If a major outage seems likely, I'll follow up with 
> additional warning.
> -Andrew

More information about the Labs-announce mailing list