[Labs-l] Outage report

Fri Jun 1 12:45:18 UTC 2012

all instances? is it ok now?

On Friday, June 1, 2012, Ryan Lane wrote:

> I'm now going to reboot the instances, since it'll bring the swapping
> down for a while.
>
> On Fri, Jun 1, 2012 at 12:24 PM, Ryan Lane <rlane32 at gmail.com<javascript:;>>
> wrote:
> > We're currently having a Labs outage. The nfs server because
> > non-responsive, causing a cascading failure. I'm suspending instances
> > currently, until load comes down. Once load is under control I'll
> > slowly resume instances. Soon, we'll be doing the following things to
> > ensure this doesn't continue to occur:
> >
> > 1. We're moving away from glusterfs to local storage on the virtual
> > nodes until we find another more appropriate solution
> > 2. We're getting rid of the labs-nfs1 instance, and will move the home
> > directories to project storage
> > 3. We're adding more (and better) hardware, that will lead to less
> > swapping, which will lead to less IO
> >
> > Sorry about the experience as of late, I'm looking forward to
> > improving the situation for us.
> >
> > - Ryan
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org <javascript:;>
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>

-- 
Sincerely,
Shujen Chang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20120601/2d10666e/attachment.html>