[Labs-l] [Labs-announce] [Tools] GridEngine maintenance - 27 Jan 2016, 1800-0200 UTC
Yuvi Panda
yuvipanda at gmail.com
Tue Jan 26 10:00:51 UTC 2016
Impact summary:
The Gridengine queue requires maintenance that may invalidate
currently running jobs. We will perform this maintenance 1/27/2016 at
1800-0200 UTC.
Over the course of the last few weeks we have experienced periodic
crashes of the Grid Engine master. We have resolved issues
surrounding multiple master processes accessing the same queue file.
Unfortunately, this has not resolved the underlying corruption.
We will attempt to dump and rebuild the queue as-is to minimize user
impact. If this process is unsuccessful we will have to start a fresh
queue. Once the
queue has been rebuilt we will be doing a rolling restart of
exec/webgird nodes to refresh job associations with the master
process.
This is part of our ongoing work to stabilize the Gridengine setup.
Thanks for your patience,
Labs Team
_______________________________________________
Labs-announce mailing list
Labs-announce at lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/labs-announce
More information about the Labs-l
mailing list