[Labs-l] IMPORTANT: Many instances slated for reboot and downtime this weekend

Andrew Bogott abogott at wikimedia.org
Fri Sep 19 15:23:51 UTC 2014


Reminder -- I'll be doing this tomorrow, starting about 24 hours from now.

-Andrew


On 9/16/14 10:45 AM, Andrew Bogott wrote:
> -- Executive Summary:
>
> Many instances will be rebooted at some point this weekend or next 
> week.  The total list of instances subject to reboot is here:
>
> https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
>
> Tools and Beta users can ignore this email.
>
>
> -- The full story:
>
> Sorry about sending two different IMPORTANT emails this week; we 
> generally try to keep labs crises to a minimum.  Indeed, this email is 
> about avoiding a potential crisis.
>
> The labs server known as 'virt1006' has been acting poorly lately. 
> Several times in the last month we've seen instances that live on 
> virt1006 get into inconsistent states during reboot... they reboot and 
> never come back up, or they stay in a perpetual 'rebooting' state.
>
> So far we've been able to rescue such instances, but the misbehavior 
> of a Labs server is very disconcerting.  Rather than wait for a full 
> collapse (and resulting sudden death of 50+ VMs) we've decided to 
> migrate all instances instances off of virt1006 and then either 
> rebuild the system or discard the hardware. Moving an instance off of 
> a server is fairly painless, but it does require a few minutes of 
> downtime and a reboot.
>
> I've spoken to a few of you directly about the reboots; the affected 
> Tools and Deployment-prep instances have already been handled. There 
> are a lot more to go, though.  If your instance is stable and has its 
> init scripts set up properly and a reboot is no big deal, then, 
> congratulations!  Otherwise, please take whatever steps you need to 
> take to batten down the hatches and get ready for a reboot.
>
> If you need the reboot to happen at a scheduled time while you are 
> standing by, that's totally fine.  In that case please schedule a 
> reboot window on this page:
>
> https://wikitech.wikimedia.org/wiki/Virt1006_rebuild
>
> Thanks for your cooperation.
>
> -Andrew




More information about the Labs-l mailing list