<div dir="ltr">Reminder: this will start in an hour.</div><div class="gmail_extra"><br><div class="gmail_quote">On 26 January 2016 at 11:00, Yuvi Panda <span dir="ltr"><<a href="mailto:yuvipanda@gmail.com" target="_blank">yuvipanda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Impact summary:<br>
<br>
The Gridengine queue requires maintenance that may invalidate<br>
currently running jobs. We will perform this maintenance 1/27/2016 at<br>
1800-0200 UTC.<br>
<br>
Over the course of the last few weeks we have experienced periodic<br>
crashes of the Grid Engine master. We have resolved issues<br>
surrounding multiple master processes accessing the same queue file.<br>
Unfortunately, this has not resolved the underlying corruption.<br>
We will attempt to dump and rebuild the queue as-is to minimize user<br>
impact. If this process is unsuccessful we will have to start a fresh<br>
queue. Once the<br>
queue has been rebuilt we will be doing a rolling restart of<br>
exec/webgird nodes to refresh job associations with the master<br>
process.<br>
<br>
This is part of our ongoing work to stabilize the Gridengine setup.<br>
<br>
Thanks for your patience,<br>
<br>
Labs Team<br>
<br>
_______________________________________________<br>
Labs-announce mailing list<br>
<a href="mailto:Labs-announce@lists.wikimedia.org">Labs-announce@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-announce" rel="noreferrer" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-announce</a><br>
_______________________________________________<br>
Labs-l mailing list<br>
<a href="mailto:Labs-l@lists.wikimedia.org">Labs-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
</blockquote></div><br></div>