[Labs-announce] Possible reboots and/or outages -- please read
Andrew Bogott
abogott at wikimedia.org
Mon May 23 17:34:00 UTC 2016
I wound up needing to reboot labvirt1003. It's up now and seems happy;
I'm currently in the process of restarting all associated instances.
Everything should be up and running within the hour... let me know if
you still see issues later in the day.
-Andrew
On 5/20/16 10:10 AM, Andrew Bogott wrote:
> Note: Tools users can ignore this message
>
> We are seeing some unusual behavior on labvirt1003, which hosts a
> large number of labs instances. The problem is not yet diagnosed, but
> it is likely a hardware problem that will require reboots or
> downtime. Here is a complete list of labs instances currently living
> on labvirt1003:
>
> https://phabricator.wikimedia.org/P3159
>
> If you have any hosts on that box that cannot survive a reboot,
> please either let me know, or take steps to minimize the damage. I've
> removed labvirt1003 from the scheduler, so if you want to build a new
> instance and migrate services to it you can be assured that the new
> instance will be isolated from the coming chaos.
>
> A simple reboot shouldn't produce more than 5-10 minutes of
> downtime. If a major outage seems likely, I'll follow up with
> additional warning.
>
> -Andrew
>
More information about the Labs-announce
mailing list