[Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Merlijn van Deen (valhallasw) valhallasw at arctus.nl
Wed Jan 27 17:07:05 UTC 2016


Reminder: this will start in an hour.

On 26 January 2016 at 11:00, Yuvi Panda <yuvipanda at gmail.com> wrote:

> Impact summary:
>
>     The Gridengine queue requires maintenance that may invalidate
> currently running jobs.  We will perform this maintenance 1/27/2016 at
> 1800-0200 UTC.
>
> Over the course of the last few weeks we have experienced periodic
> crashes of the Grid Engine master.  We have resolved issues
>  surrounding multiple master processes accessing the same queue file.
> Unfortunately, this has not resolved the underlying corruption.
>  We will attempt to dump and rebuild the queue as-is to minimize user
> impact.  If this process is unsuccessful we will have to start a fresh
> queue.  Once the
>  queue has been rebuilt we will be doing a rolling restart of
> exec/webgird nodes to refresh job associations with the master
> process.
>
> This is part of our ongoing work to stabilize the Gridengine setup.
>
> Thanks for your patience,
>
> Labs Team
>
> _______________________________________________
> Labs-announce mailing list
> Labs-announce at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-announce
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160127/08c6bbd5/attachment.html>


More information about the Labs-l mailing list