[Labs-l] Instance creation temporarily unavailable and scheduled downtime on March 6th

Ryan Lane rlane at wikimedia.org
Mon Mar 5 10:49:05 UTC 2012


While troubleshooting an issue with instances failing to reboot I
accidentally wiped out the _base directory for the instances. When
nova creates instances, it pulls the original media (the image) from
glance, and sticks it into the _base directory. It then creates a
directory for the instance, and makes qcow2 disks, based on the image
in the _base directory. It does this so that, for instance, if we had
70 lucid instances running, you wouldn't need to add 70 identical
images, instead all 70 instances can share the base image, and only
track changes in the instance's disks.

I recovered the base images via the file descriptors, but all of the
instances are still holding onto the old file descriptors, and those
are in a directory that is missing. FUSE filesystems don't handle this
well, and we're using glusterfs via FUSE. The only way to be able to
recreate the directory, and move the base images back in is for all of
the instances to be shutdown.

There's an additional issue, though. Nova needs the _base directory to
reboot, and to create new instances. Until the directory is back,
instance creation will be unavailable, and any reboots will fail.
Since reboots will fail, all instances will need to be shutdown before
any can be brought back up.

I'll be shutting down all instances on March 6th at 18:00:00 UTC (10
AM PST). I expect the downtime to be roughly 1-2 hours. If you have
any work in progress, please ensure it's saved before the downtime.

Sorry for the inconvenience.

- Ryan



More information about the Labs-l mailing list