<div dir="ltr">On Thu, Feb 14, 2013 at 11:20 AM, Ryan Lane <span dir="ltr"><<a href="mailto:rlane@wikimedia.org" target="_blank">rlane@wikimedia.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">The glusterd process went into a death spiral at around midnight UTC last night. The glusterfs/glusterfsd processes continued to work fine, which allowed the filesystem to continue to work properly, but all four servers were approaching swap death.<div>


<br></div><div>This version of gluster also has issues with the upstart scripts. It won't properly start/stop the gluster services. I'm having to reboot the hosts. I'm going to track down this issue today in a labs instance. For the next few hours some projects will have issues accessing project and/or home directories.</div>


<div><br></div><div>This will not affect services using instance storage (/mnt).</div><span class="HOEnZb"><font color="#888888"><div><br></div></font></span></div></blockquote><div><br></div><div></div></div></div><div class="gmail_extra" style>


Volumes are being force restarted right now. login should work to all nodes and project storage should work perfectly fine for most projects, currently. All volumes should be completely up in about an hour.</div><div class="gmail_extra" style>


<br></div><div class="gmail_extra" style>The glusterfs folks can't reproduce the upstart issue we're seeing in our cluster. As a workaround for now, I've replaced the upstarts with init scripts, which behave exactly as expected. It should be possible to work around the outage condition we had today in the future without a prolonged volume force start state.</div>


<div class="gmail_extra" style><br></div><div class="gmail_extra" style>- Ryan</div></div>