[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC
Gerard Meijssen
gerard.meijssen at gmail.com
Sun Feb 22 18:26:23 UTC 2015
Hoi,
Maybe an impact analysis has been done on the consequences of downtime of
Labs servers.
I would be interesting to see the analysis about the loss of data currently
held on the Labs servers
It would be interesting to learn how many people are affected
It would be interesting to learn what the impact is on the maintenance of
the "production" servers / applications
It might make it necessary to reassess what production means.
It might provide the reasons to reassess current practices..
Who is responsible for the continuity of the labs servers... NO they are
not the LABS operators they are not equipped to handle the work load as
is.. Who needs to know these answers ?
Thanks,
GerardM
On 22 February 2015 at 18:18, Andrew Bogott <abogott at wikimedia.org> wrote:
> On 2/21/15 2:29 AM, Petr Bena wrote:
>
>> RANT 2
>>
>> Why don't we investigate what is taking so much space there? AFAIK
>> it's 30TB storage, it shouldn't be filling up rapidly, isn't that just
>> some broken tool that infinitely writes garbage to /data/project?
>>
> There was such a project, but I killed it a couple of weeks ago. Today,
> the file server looks like this:
>
> /dev/mapper/os-var 92G 3.2G 84G 4% /var
> /dev/mapper/store-project 30T 15T 16T 49% /srv/project
> /dev/mapper/store-keys 960M 47M 913M 5% /srv/keys
> /dev/md123 7.3T 958G 6.3T 13% /srv/scratch
>
> Pleasingly, there aren't really any giant, serious offenders in that 15T
> -- usage is distributed fairly well among a large number of projects, with
> the biggest user being (understandably) Tools.
>
> So, not actually full. Still, 50% full is full enough to start looking
> towards future expansion. It's unfortunate that this window is right on
> the heels of our outage last week, but it needs to happen and I can't think
> of any reason why it would be better to postpone it.
>
> If puppet was written in proper language (C++) we wouldn't need more RAM
>> :P
>>
> That is hard to disagree with! Still, virt1000 has been struggling and
> underpowered for quite a while now, and the 5 minutes that it'll take to
> drop in more RAM will be /much/ less disruptive than a rewrite of our
> server admin software :)
>
>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150222/8f0cec74/attachment.html>
More information about the Labs-l
mailing list