[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC

Thu Feb 19 21:00:58 UTC 2015

It is with a heavy heart that I must share the news of an upcoming Labs 
maintenance window.

The labs NFS store (which you probably know as /data/project) is filling 
up rapidly and we need to add more drives.  By weird coincidence the 
actual physical space for that server in the datacenter is ALSO filling 
up, so Chris Johnson has graciously agreed to spend his day re-shuffling 
servers in order to make space for the new diskshelf.  This involves 
lots of unplugging and replugging and amounts to the fact that the NFS 
server will need to be turned off for several hours.

During this window Chris will take care of another long-deferred 
maintenance task -- he's putting more RAM into the labs puppet master, 
virt1000.

What will break:

- Shared storage for all labs and tools instances.  That includes 
volumes like /data/project, /public/dumps, /data/scratch, /home

- Logins to all instances running ubuntu Precise.  (Trusty hosts will 
/probably/ still support logins.)

- Login to wikitech and manipulation of instances.

What won't break:

- Labs instances will continue to run

- Tasks running on instances will continue to run; those that don't rely 
on shared storage should be fine.

- Web proxies should keep working, if the services they support aren't 
relying on shared storage.

What will get better:

- More storage space!

- Fewer problems with dumps filling up NFS (which is basically the same 
as 'more storage space'.

- More reliable puppet runs and fewer outages with miscellaneous 
OpenStack services (which also run on virt1000)

I apologize in advance for this downtime.  Don't hesitate to contact me 
or Coren either here or on IRC with advice about how to harden your tool 
against this upcoming outage.  We will also be available on IRC during 
and after the outage to help revive things that are angry about the 
timeouts.

-Andrew