[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC

Petr Bena benapetr at gmail.com
Sat Feb 21 10:27:20 UTC 2015


RANT

If puppet was written in proper language (C++) we wouldn't need more RAM :P

On Fri, Feb 20, 2015 at 5:07 PM, Ricordisamoa
<ricordisamoa at openmailbox.org> wrote:
> Thank you.
> I (and probably many others) would like someone from the Ops team to
> elaborate on the uptime and general reliability Labs (especially Tools) is
> supposed to have, and for what kind of services it is suitable for, to
> prevent future misunderstandings in regards to loss of important work, etc.
>
> Il 19/02/2015 22:00, Andrew Bogott ha scritto:
>
>> It is with a heavy heart that I must share the news of an upcoming Labs
>> maintenance window.
>>
>> The labs NFS store (which you probably know as /data/project) is filling
>> up rapidly and we need to add more drives.  By weird coincidence the actual
>> physical space for that server in the datacenter is ALSO filling up, so
>> Chris Johnson has graciously agreed to spend his day re-shuffling servers in
>> order to make space for the new diskshelf.  This involves lots of unplugging
>> and replugging and amounts to the fact that the NFS server will need to be
>> turned off for several hours.
>>
>> During this window Chris will take care of another long-deferred
>> maintenance task -- he's putting more RAM into the labs puppet master,
>> virt1000.
>>
>> What will break:
>>
>> - Shared storage for all labs and tools instances.  That includes volumes
>> like /data/project, /public/dumps, /data/scratch, /home
>>
>> - Logins to all instances running ubuntu Precise.  (Trusty hosts will
>> /probably/ still support logins.)
>>
>> - Login to wikitech and manipulation of instances.
>>
>> What won't break:
>>
>> - Labs instances will continue to run
>>
>> - Tasks running on instances will continue to run; those that don't rely
>> on shared storage should be fine.
>>
>> - Web proxies should keep working, if the services they support aren't
>> relying on shared storage.
>>
>> What will get better:
>>
>> - More storage space!
>>
>> - Fewer problems with dumps filling up NFS (which is basically the same as
>> 'more storage space'.
>>
>> - More reliable puppet runs and fewer outages with miscellaneous OpenStack
>> services (which also run on virt1000)
>>
>> I apologize in advance for this downtime.  Don't hesitate to contact me or
>> Coren either here or on IRC with advice about how to harden your tool
>> against this upcoming outage.  We will also be available on IRC during and
>> after the outage to help revive things that are angry about the timeouts.
>>
>> -Andrew
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l



More information about the Labs-l mailing list