[Toolserver-l] Solving problems with qsub with -b y option

Marcin Cieslak saper at saper.info
Mon Feb 28 20:59:51 UTC 2011


>> Magnus Manske <magnusmanske at googlemail.com> wrote:
>>>>
>>>> Wow, what a mess. Why does cronsub/qsub try to divine hidden vodoo
>>>> meaning from my scripts (which predate qsub availability) anyway? If
>>>> that is desired for some cases, why is that activated by default? Is
>>>> there at least a qsub option to turn it off? Will that option work
>>>> with cronsub?
>>>
>>> To quote man qsub(1):
>>
>> There is another option that's quite useful:
>>
>> qsub -b y path-to-program
>
> Thanks, very helpful. Do you know if cronsub supports this?

Have a look at /opt/local/bin/cronsub, it's not a very complicated script.
Cronsub does not currently allow for adding custom qsub parameters.
But what you can of course use own modified copy or whatever. 

>> This causes path-to-program (use full path like /usr/bin/something or
>> path relative to your $HOME or it won't work) to be executed
>> as a "binary" program. Your script will not be copied to a temporary
>> directory on the SGE cluster and will be ran in place.
>
> I believe that is default behaviour anyway now? I vaguely remember a
> mail concerning that WRT Python.

cronsub currently copies the contents of the script to the temporary file
that will be executed and copied by SGE, leaving original script intact
in-place. 

-b y has its disadvantages, though. When I execute the program
(I run a binary C program for test) and it abruptly stops by dumping
core without generating any output. I combine "-b y" with "-m ae"
to receive a mail when the job is completed, in case of crash it's reported
in the e-mail:

To: saper at clematis.toolserver.org
Subject: Job 211410 (howto2) Aborted
Date: Mon, 28 Feb 2011 20:57:51 +0000 (UTC)
From: root at wolfsbane.toolserver.org (Super-User)

Job 211410 (howto2) Aborted
 Exit Status      = 139
 Signal           = SEGV
 User             = saper
 Queue            = all.q at wolfsbane.toolserver.org
 Host             = wolfsbane.toolserver.org
 Start Time       = 02/28/2011 20:57:50
 End Time         = 02/28/2011 20:57:51
 CPU              = 00:00:00
 Max vmem         = NA
failed assumedly after job because:
job 211410.1 died through signal SEGV (11)

With shell scripts it is much more obvious, since shells complain
to stderr (as in your case at least you could see the message).

//Marcin





More information about the Toolserver-l mailing list