Re: [Toolserver-l] SGE queue waiting forever?

25 Nov 2012


      Am 24.11.2012 20:43, schrieb Marlen Caemmerer:
...
Hello,
a broken nfs mount was the source of the slow login.
Dont know if it affected SGE as well but I tried to mount the user-store
and I got the error "Out of stream resources".
There might be something fishy with the local disks too since cat
/etc/vfstab took ages 2 times and ls resulted in "no such file or
directory" twice too.
But ipmi logs and the raid utility from solaris showed no faults.
I rebooted and the system now seems to be running ok.
Do you still see any issue?
Cheers
     nosy
At 20:32 on Nov 23th sge on turnera stopped and was started at damiana. 
The qmaster thread started successfully because it responses pings and 
so on. But the scheduler thread seems not to work. qconf -tsm does not 
show any status information (which whould be written to logs when is 
send this command). That's why no new jobs are send to execution clients.
So the switch over on the ha-cluster failed.
Merlissimo
@All: If you are working on big files please copy them to local temp 
first (on sge $TMP contains an individual temp dir for the job). E.g. 
piping big files to other slow programs causes much nfs load because 
data must be read in small packages which cause high load on servers. 
That's why sge cannot schedule new jobs on nightshade since days.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] SGE queue waiting forever?