[Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC

Yuvi Panda yuvipanda at gmail.com
Tue Jan 26 10:00:51 UTC 2016


Impact summary:

    The Gridengine queue requires maintenance that may invalidate
currently running jobs.  We will perform this maintenance 1/27/2016 at
1800-0200 UTC.

Over the course of the last few weeks we have experienced periodic
crashes of the Grid Engine master.  We have resolved issues
 surrounding multiple master processes accessing the same queue file.
Unfortunately, this has not resolved the underlying corruption.
 We will attempt to dump and rebuild the queue as-is to minimize user
impact.  If this process is unsuccessful we will have to start a fresh
queue.  Once the
 queue has been rebuilt we will be doing a rolling restart of
exec/webgird nodes to refresh job associations with the master
process.

This is part of our ongoing work to stabilize the Gridengine setup.

Thanks for your patience,

Labs Team

_______________________________________________
Labs-announce mailing list
Labs-announce at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/labs-announce


More information about the Labs-l mailing list