Magnus Manske magnusmanske@googlemail.com wrote:
Wow, what a mess. Why does cronsub/qsub try to divine hidden vodoo meaning from my scripts (which predate qsub availability) anyway? If that is desired for some cases, why is that activated by default? Is there at least a qsub option to turn it off? Will that option work with cronsub?
To quote man qsub(1):
There is another option that's quite useful:
qsub -b y path-to-program
Thanks, very helpful. Do you know if cronsub supports this?
Have a look at /opt/local/bin/cronsub, it's not a very complicated script. Cronsub does not currently allow for adding custom qsub parameters. But what you can of course use own modified copy or whatever.
This causes path-to-program (use full path like /usr/bin/something or path relative to your $HOME or it won't work) to be executed as a "binary" program. Your script will not be copied to a temporary directory on the SGE cluster and will be ran in place.
I believe that is default behaviour anyway now? I vaguely remember a mail concerning that WRT Python.
cronsub currently copies the contents of the script to the temporary file that will be executed and copied by SGE, leaving original script intact in-place.
-b y has its disadvantages, though. When I execute the program (I run a binary C program for test) and it abruptly stops by dumping core without generating any output. I combine "-b y" with "-m ae" to receive a mail when the job is completed, in case of crash it's reported in the e-mail:
To: saper@clematis.toolserver.org Subject: Job 211410 (howto2) Aborted Date: Mon, 28 Feb 2011 20:57:51 +0000 (UTC) From: root@wolfsbane.toolserver.org (Super-User)
Job 211410 (howto2) Aborted Exit Status = 139 Signal = SEGV User = saper Queue = all.q@wolfsbane.toolserver.org Host = wolfsbane.toolserver.org Start Time = 02/28/2011 20:57:50 End Time = 02/28/2011 20:57:51 CPU = 00:00:00 Max vmem = NA failed assumedly after job because: job 211410.1 died through signal SEGV (11)
With shell scripts it is much more obvious, since shells complain to stderr (as in your case at least you could see the message).
//Marcin