I've noticed some irregularity in job execution through SGE over the past few days. Currently it seems several queues are either disabled or in an error state.
Is this expected? Is there an easy way to get an idea about how many jobs are queued and how quickly they're executed, in other words how to predict when a certain job might be run? Or maybe this is just a temporary issue that'll get resolved shortly?
Cheers, Morten
Hello, At Friday 03 May 2013 10:54:42 DaB. wrote:
I've noticed some irregularity in job execution through SGE over the past few days. Currently it seems several queues are either disabled or in an error state.
Is this expected? Is there an easy way to get an idea about how many jobs are queued and how quickly they're executed, in other words how to predict when a certain job might be run? Or maybe this is just a temporary issue that'll get resolved shortly?
If an queue is in a error-state something is wrong and it needs a root or an operator to fix this (most times just a clearing is enough). Queues that are disabled are deactivated by purpose. I cleared the error-queues now and I will look where the problem with mayapple is. It is not a easy thing to get how many jobs are waiting. The reason is that some users commit a lot of jobs that are executed with a throttle (~commit 50 jobs but do not more than 5 in parallel) – which is perfectly fine. Normally we have enough resources that no job waits more than a few hours at maximum – but there are exceptions.
Cheers, Morten
Sincerely, DaB.
toolserver-l@lists.wikimedia.org