[Labs-l] IMPORTANT: Many instances slated for reboot and downtime this weekend
Andrew Bogott
abogott at wikimedia.org
Fri Sep 19 15:23:51 UTC 2014
Reminder -- I'll be doing this tomorrow, starting about 24 hours from now.
-Andrew
On 9/16/14 10:45 AM, Andrew Bogott wrote:
> -- Executive Summary:
>
> Many instances will be rebooted at some point this weekend or next
> week. The total list of instances subject to reboot is here:
>
> https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
>
> Tools and Beta users can ignore this email.
>
>
> -- The full story:
>
> Sorry about sending two different IMPORTANT emails this week; we
> generally try to keep labs crises to a minimum. Indeed, this email is
> about avoiding a potential crisis.
>
> The labs server known as 'virt1006' has been acting poorly lately.
> Several times in the last month we've seen instances that live on
> virt1006 get into inconsistent states during reboot... they reboot and
> never come back up, or they stay in a perpetual 'rebooting' state.
>
> So far we've been able to rescue such instances, but the misbehavior
> of a Labs server is very disconcerting. Rather than wait for a full
> collapse (and resulting sudden death of 50+ VMs) we've decided to
> migrate all instances instances off of virt1006 and then either
> rebuild the system or discard the hardware. Moving an instance off of
> a server is fairly painless, but it does require a few minutes of
> downtime and a reboot.
>
> I've spoken to a few of you directly about the reboots; the affected
> Tools and Deployment-prep instances have already been handled. There
> are a lot more to go, though. If your instance is stable and has its
> init scripts set up properly and a reboot is no big deal, then,
> congratulations! Otherwise, please take whatever steps you need to
> take to batten down the hatches and get ready for a reboot.
>
> If you need the reboot to happen at a scheduled time while you are
> standing by, that's totally fine. In that case please schedule a
> reboot window on this page:
>
> https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
>
> Thanks for your cooperation.
>
> -Andrew
More information about the Labs-l
mailing list