[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC -- finished

Gerard Meijssen gerard.meijssen at gmail.com
Tue Feb 24 21:47:24 UTC 2015


Hoi,
Andrew, Coren, Yuvi, thank you... You have not been lucky but you certainly
get my vote for the great effort you have put in.
Thanks,
     GerardM

On 24 February 2015 at 19:43, Andrew Bogott <abogott at wikimedia.org> wrote:

> This is done.  Instances were largely back up and running half an hour
> ago, and Coren and Yuvi have now prodded various Tools jobs and services
> back to life.
>
> As always, please email or contact us on IRC if your particular tool or
> instance is still misbehaving.
>
> -Andrew
>
>
>
> On 2/19/15 1:00 PM, Andrew Bogott wrote:
>
>> It is with a heavy heart that I must share the news of an upcoming Labs
>> maintenance window.
>>
>> The labs NFS store (which you probably know as /data/project) is filling
>> up rapidly and we need to add more drives.  By weird coincidence the actual
>> physical space for that server in the datacenter is ALSO filling up, so
>> Chris Johnson has graciously agreed to spend his day re-shuffling servers
>> in order to make space for the new diskshelf.  This involves lots of
>> unplugging and replugging and amounts to the fact that the NFS server will
>> need to be turned off for several hours.
>>
>> During this window Chris will take care of another long-deferred
>> maintenance task -- he's putting more RAM into the labs puppet master,
>> virt1000.
>>
>> What will break:
>>
>> - Shared storage for all labs and tools instances.  That includes volumes
>> like /data/project, /public/dumps, /data/scratch, /home
>>
>> - Logins to all instances running ubuntu Precise.  (Trusty hosts will
>> /probably/ still support logins.)
>>
>> - Login to wikitech and manipulation of instances.
>>
>> What won't break:
>>
>> - Labs instances will continue to run
>>
>> - Tasks running on instances will continue to run; those that don't rely
>> on shared storage should be fine.
>>
>> - Web proxies should keep working, if the services they support aren't
>> relying on shared storage.
>>
>> What will get better:
>>
>> - More storage space!
>>
>> - Fewer problems with dumps filling up NFS (which is basically the same
>> as 'more storage space'.
>>
>> - More reliable puppet runs and fewer outages with miscellaneous
>> OpenStack services (which also run on virt1000)
>>
>> I apologize in advance for this downtime.  Don't hesitate to contact me
>> or Coren either here or on IRC with advice about how to harden your tool
>> against this upcoming outage.  We will also be available on IRC during and
>> after the outage to help revive things that are angry about the timeouts.
>>
>> -Andrew
>>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150224/01858854/attachment.html>


More information about the Labs-l mailing list