[Labs-announce] Possible reboots and/or outages -- please read

Andrew Bogott abogott at wikimedia.org
Mon May 23 17:34:00 UTC 2016


I wound up needing to reboot labvirt1003.  It's up now and seems happy; 
I'm currently in the process of restarting all associated instances.  
Everything should be up and running within the hour... let me know if 
you still see issues later in the day.

-Andrew

On 5/20/16 10:10 AM, Andrew Bogott wrote:
> Note:  Tools users can ignore this message
>
>     We are seeing some unusual behavior on labvirt1003, which hosts a 
> large number of labs instances.  The problem is not yet diagnosed, but 
> it is likely a hardware problem that will require reboots or 
> downtime.  Here is a complete list of labs instances currently living 
> on labvirt1003:
>
> https://phabricator.wikimedia.org/P3159
>
>     If you have any hosts on that box that cannot survive a reboot, 
> please either let me know, or take steps to minimize the damage.  I've 
> removed labvirt1003 from the scheduler, so if you want to build a new 
> instance and migrate services to it you can be assured that the new 
> instance will be isolated from the coming chaos.
>
>     A simple reboot shouldn't produce more than 5-10 minutes of 
> downtime.  If a major outage seems likely, I'll follow up with 
> additional warning.
>
> -Andrew
>




More information about the Labs-announce mailing list