[Labs-l] Outage report

Ryan Lane rlane32 at gmail.com
Fri Jun 1 10:24:19 UTC 2012


We're currently having a Labs outage. The nfs server because
non-responsive, causing a cascading failure. I'm suspending instances
currently, until load comes down. Once load is under control I'll
slowly resume instances. Soon, we'll be doing the following things to
ensure this doesn't continue to occur:

1. We're moving away from glusterfs to local storage on the virtual
nodes until we find another more appropriate solution
2. We're getting rid of the labs-nfs1 instance, and will move the home
directories to project storage
3. We're adding more (and better) hardware, that will lead to less
swapping, which will lead to less IO

Sorry about the experience as of late, I'm looking forward to
improving the situation for us.

- Ryan



More information about the Labs-l mailing list