Peter Körner osm-lists@mazdermind.de wrote:
Since a few days I'm getting weird errors when submitting tasks.
My Cronjob calls "/home/mazder/public_html/replicate-sequences/update-submit.sh" which conains the following command:
qcronsub -l h_rt=0:05:00 -l virtual_free=100M -l arch=* -l sql-user-m=1 -N mazder-replicate-sequences -m as -o '/home/mazder/public_html/replicate-sequences/sge' /home/mazder/public_html/replicate-sequences/update-run.sh'
Most of these calls produce the error below, which seems not to be an error in my code as I neither use xml nor python.
Do you have any Idea what's going wrong?
[...]
An educated guess: The Python errors come from the script /sge/GE/bin/sol-amd64/qjobtest that is called as part of qcronsub to test whether a job with that name is already running. qjobtest parses the output of "qstat -xml ..." which in normal operation returns a valid XML document. My assumption is that when SGE is down, qstat returns the error messages ("error: commlib error: can't connect to service (Connection refused)", etc.) as plain text which can't be parsed as XML which in return causes qjobtest to barf.
In short: This is another artefact of SGE being down at that moment, you can't do anything about it, just ignore.
Tim