[Labs-announce] IMPORTANT: Can your instance tolerate a reboot? (finished)

Andrew Bogott abogott at wikimedia.org
Mon May 11 14:26:27 UTC 2015


This move is now done.  Please contact me immediately if you're having 
trouble with any of your instances -- there are currently backups of 
most instance files remaining on the old hardware, but I'll be cleaning 
that up later in the week.

-Andrew



On 4/28/15 5:10 PM, Andrew Bogott wrote:
> Executive summary:
>
> I'm going to reboot a ton of instances at random times next week. If 
> you don't want me to reboot yours, email me.
>
> Don't worry about Tools or Deployment-prep; they're already on the 
> 'handle with care' list.
>
> Explanation:
>
> I've been migrating instances to new hardware like crazy, and this 
> morning discovered that, upon arrival on a new server, instances are 
> taking up MUCH more disk space than they were.  In some cases, 10 or 
> 15 times as much.
>
> This turns out to be an issue with live migration and copy-on-write 
> instances.  The live migration code doesn't know about 
> never-used-and-not-allocated-space in an instance, so when I migrate 
> an xlarge instance that only used 8G of disk space (but had an 
> allocated 160G of space), live-migrate copies that extra 152G of 
> emptiness, thus foiling all of our attempts to safely overprovision.
>
> Cold migration (which involves shutting down an instance and copying 
> the whole VM in one lump) does not have this problem.  So, two things 
> are going to happen:
>
> - Instances that have not yet moved to the new hardware will be cold 
> migrated rather than live migrated.  That means a shutdown, a 
> few-minute delay, and a restart.
>
> - Large and XLarge instances that have already migrated need to be 
> re-shrunk to their proper copy-on-write size.  That's pretty quick, 
> but also requires a stop and start.
>
> If the idea of a few minutes of downtime for your instance doesn't 
> worry you, then you can do nothing.  If you need your downtime 
> scheduled /or/ if downtime is unacceptable, just let me know.  I can 
> live-migrate a few extra-precious instances and avoid downtime if 
> needed.  Don't hesitate to ask.
>
> None of this will start until Monday the the 4th.  That gives us a 
> good long while to get Tools squared away, and also gives you a good 
> long time to notice this email :)
>
> Sorry this upgrade has involved so many additional complications!
>
> -Andrew
>




More information about the Labs-announce mailing list