<div dir="ltr">As promised, the post-mortem.<div><br></div><div>tl,dr: the corruption issue we had in december is still there, and bites us every now and then. We're not entirely sure what is causing the corruption, but we suspect NFS, and are working to move the database to a local filesystem.</div><div><br></div><div>Long story: <a href="https://wikitech.wikimedia.org/wiki/Incident_documentation/20160112-20160111-toollabs-SGE">https://wikitech.wikimedia.org/wiki/Incident_documentation/20160112-20160111-toollabs-SGE</a></div><div><br></div><div>Again, sorry for the disruptions. Unfortunately we cannot guarantee there will not be more of these outages in the near future.</div><div><br></div><div>Merlijn</div></div><div class="gmail_extra"><br><div class="gmail_quote">On 11 January 2016 at 23:15, Merlijn van Deen <span dir="ltr"><<a href="mailto:valhallasw@arctus.nl" target="_blank">valhallasw@arctus.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Somehow sending an e-mail to labs-l seems to resolve issues magically. The issue started around 21:00 UTC, and I'll write up a post-mortem tomorrow.</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On 11 January 2016 at 23:10, Merlijn van Deen <span dir="ltr"><<a href="mailto:valhallasw@arctus.nl" target="_blank">valhallasw@arctus.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Jobs are being queued, but are not executing. Every now and then a few jobs /are/ executed, but the backlog is ~20 minutes. We're not quite sure what's happening, unfortunately, but we're working on it.</div>

</blockquote></div><br></div>

</div></div></blockquote></div><br></div>