Re: [Wikitech-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC -- finished

25 Feb 2015

      This is done.  Instances were largely back up and running half an hour 
ago, and Coren and Yuvi have now prodded various Tools jobs and services 
back to life.
As always, please email or contact us on IRC if your particular tool or 
instance is still misbehaving.
-Andrew
On 2/19/15 1:00 PM, Andrew Bogott wrote:
...
It is with a heavy heart that I must share the news of an upcoming 
Labs maintenance window.
The labs NFS store (which you probably know as /data/project) is 
filling up rapidly and we need to add more drives.  By weird 
coincidence the actual physical space for that server in the 
datacenter is ALSO filling up, so Chris Johnson has graciously agreed 
to spend his day re-shuffling servers in order to make space for the 
new diskshelf.  This involves lots of unplugging and replugging and 
amounts to the fact that the NFS server will need to be turned off for 
several hours.
During this window Chris will take care of another long-deferred 
maintenance task -- he's putting more RAM into the labs puppet master, 
virt1000.
What will break:

Shared storage for all labs and tools instances.  That includes

volumes like /data/project, /public/dumps, /data/scratch, /home

Logins to all instances running ubuntu Precise.  (Trusty hosts will

/probably/ still support logins.)

Login to wikitech and manipulation of instances.

What won't break:

Labs instances will continue to run

Tasks running on instances will continue to run; those that don't

rely on shared storage should be fine.

Web proxies should keep working, if the services they support aren't

relying on shared storage.
What will get better:

More storage space!

Fewer problems with dumps filling up NFS (which is basically the

same as 'more storage space'.

More reliable puppet runs and fewer outages with miscellaneous

OpenStack services (which also run on virt1000)
I apologize in advance for this downtime.  Don't hesitate to contact 
me or Coren either here or on IRC with advice about how to harden your 
tool against this upcoming outage.  We will also be available on IRC 
during and after the outage to help revive things that are angry about 
the timeouts.
-Andrew

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC -- finished