[Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Merlijn van Deen (valhallasw) valhallasw at arctus.nl
Wed Jan 27 18:10:13 UTC 2016


And we are now starting :-)

On 27 January 2016 at 18:07, Merlijn van Deen (valhallasw) <
valhallasw at arctus.nl> wrote:

> Reminder: this will start in an hour.
>
> On 26 January 2016 at 11:00, Yuvi Panda <yuvipanda at gmail.com> wrote:
>
>> Impact summary:
>>
>>     The Gridengine queue requires maintenance that may invalidate
>> currently running jobs.  We will perform this maintenance 1/27/2016 at
>> 1800-0200 UTC.
>>
>> Over the course of the last few weeks we have experienced periodic
>> crashes of the Grid Engine master.  We have resolved issues
>>  surrounding multiple master processes accessing the same queue file.
>> Unfortunately, this has not resolved the underlying corruption.
>>  We will attempt to dump and rebuild the queue as-is to minimize user
>> impact.  If this process is unsuccessful we will have to start a fresh
>> queue.  Once the
>>  queue has been rebuilt we will be doing a rolling restart of
>> exec/webgird nodes to refresh job associations with the master
>> process.
>>
>> This is part of our ongoing work to stabilize the Gridengine setup.
>>
>> Thanks for your patience,
>>
>> Labs Team
>>
>> _______________________________________________
>> Labs-announce mailing list
>> Labs-announce at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160127/6e95371d/attachment.html>


More information about the Labs-l mailing list