[Labs-l] Another labs outage - curse of the accursed hardware failure continues

Brad Jorsch (Anomie) bjorsch at wikimedia.org
Fri Feb 27 14:09:14 UTC 2015


On Fri, Feb 27, 2015 at 2:27 AM, Yuvi Panda <yuvipanda at gmail.com> wrote:

> ToolLabs should be back up mostly now - web tools and most bots should
> be functioning fine. Some are still down, and we're continuing to work
> on it.
>

There appears to be some fallout, I had two jobs that qstat thought were
still alive but the processes were not actually running on the exec hosts.

IRC discussion recommended qdel -f, and resubmitting the jobs once the
"zombie" job disappeared.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150227/02705767/attachment.html>


More information about the Labs-l mailing list