[Labs-l] Partial (but dramatic) labs outage on Tuesday: 2015-02-24 1500UTC-1800UTC

Sun Feb 22 17:18:10 UTC 2015

On 2/21/15 2:29 AM, Petr Bena wrote:
> RANT 2
>
> Why don't we investigate what is taking so much space there? AFAIK
> it's 30TB storage, it shouldn't be filling up rapidly, isn't that just
> some broken tool that infinitely writes garbage to /data/project?
There was such a project, but I killed it a couple of weeks ago. Today, 
the file server looks like this:

/dev/mapper/os-var          92G  3.2G   84G   4% /var
/dev/mapper/store-project   30T   15T   16T  49% /srv/project
/dev/mapper/store-keys     960M   47M  913M   5% /srv/keys
/dev/md123                 7.3T  958G  6.3T  13% /srv/scratch

Pleasingly, there aren't really any giant, serious offenders in that 15T 
-- usage is distributed fairly well among a large number of projects, 
with the biggest user being (understandably) Tools.

So, not actually full.  Still, 50% full is full enough to start looking 
towards future expansion.  It's unfortunate that this window is right on 
the heels of our outage last week, but it needs to happen and I can't 
think of any reason why it would be better to postpone it.

> If puppet was written in proper language (C++) we wouldn't need more RAM :P
That is hard to disagree with!  Still, virt1000 has been struggling and 
underpowered for quite a while now, and the 5 minutes that it'll take to 
drop in more RAM will be /much/ less disruptive than a rewrite of our 
server admin software :)