[Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC
Merlijn van Deen (valhallasw)
valhallasw at arctus.nl
Wed Jan 27 17:07:05 UTC 2016
Reminder: this will start in an hour.
On 26 January 2016 at 11:00, Yuvi Panda <yuvipanda at gmail.com> wrote:
> Impact summary:
>
> The Gridengine queue requires maintenance that may invalidate
> currently running jobs. We will perform this maintenance 1/27/2016 at
> 1800-0200 UTC.
>
> Over the course of the last few weeks we have experienced periodic
> crashes of the Grid Engine master. We have resolved issues
> surrounding multiple master processes accessing the same queue file.
> Unfortunately, this has not resolved the underlying corruption.
> We will attempt to dump and rebuild the queue as-is to minimize user
> impact. If this process is unsuccessful we will have to start a fresh
> queue. Once the
> queue has been rebuilt we will be doing a rolling restart of
> exec/webgird nodes to refresh job associations with the master
> process.
>
> This is part of our ongoing work to stabilize the Gridengine setup.
>
> Thanks for your patience,
>
> Labs Team
>
> _______________________________________________
> Labs-announce mailing list
> Labs-announce at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-announce
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160127/08c6bbd5/attachment.html>
More information about the Labs-l
mailing list