[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC

Gerard Meijssen gerard.meijssen at gmail.com
Sun Feb 22 18:26:23 UTC 2015


Hoi,
Maybe an impact analysis has been done on the consequences of downtime of
Labs servers.
I would be interesting to see the analysis about the loss of data currently
held on the Labs servers
It would be interesting to learn how many people are affected
It would be interesting to learn what the impact is on the maintenance of
the "production" servers / applications
It might make it necessary to reassess what production means.
It might provide the reasons to reassess current practices..

Who is responsible for the continuity of the labs servers... NO they are
not the LABS operators they are not equipped to handle the work load as
is.. Who needs to know these answers ?
Thanks,
      GerardM

On 22 February 2015 at 18:18, Andrew Bogott <abogott at wikimedia.org> wrote:

> On 2/21/15 2:29 AM, Petr Bena wrote:
>
>> RANT 2
>>
>> Why don't we investigate what is taking so much space there? AFAIK
>> it's 30TB storage, it shouldn't be filling up rapidly, isn't that just
>> some broken tool that infinitely writes garbage to /data/project?
>>
> There was such a project, but I killed it a couple of weeks ago. Today,
> the file server looks like this:
>
> /dev/mapper/os-var          92G  3.2G   84G   4% /var
> /dev/mapper/store-project   30T   15T   16T  49% /srv/project
> /dev/mapper/store-keys     960M   47M  913M   5% /srv/keys
> /dev/md123                 7.3T  958G  6.3T  13% /srv/scratch
>
> Pleasingly, there aren't really any giant, serious offenders in that 15T
> -- usage is distributed fairly well among a large number of projects, with
> the biggest user being (understandably) Tools.
>
> So, not actually full.  Still, 50% full is full enough to start looking
> towards future expansion.  It's unfortunate that this window is right on
> the heels of our outage last week, but it needs to happen and I can't think
> of any reason why it would be better to postpone it.
>
>  If puppet was written in proper language (C++) we wouldn't need more RAM
>> :P
>>
> That is hard to disagree with!  Still, virt1000 has been struggling and
> underpowered for quite a while now, and the 5 minutes that it'll take to
> drop in more RAM will be /much/ less disruptive than a rewrite of our
> server admin software :)
>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150222/8f0cec74/attachment.html>


More information about the Labs-l mailing list