[Labs-l] Another labs outage - curse of the accursed hardware failure continues
Brad Jorsch (Anomie)
bjorsch at wikimedia.org
Fri Feb 27 14:09:14 UTC 2015
On Fri, Feb 27, 2015 at 2:27 AM, Yuvi Panda <yuvipanda at gmail.com> wrote:
> ToolLabs should be back up mostly now - web tools and most bots should
> be functioning fine. Some are still down, and we're continuing to work
> on it.
>
There appears to be some fallout, I had two jobs that qstat thought were
still alive but the processes were not actually running on the exec hosts.
IRC discussion recommended qdel -f, and resubmitting the jobs once the
"zombie" job disappeared.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150227/02705767/attachment.html>
More information about the Labs-l
mailing list