[Labs-l] [Reminder] Participate in the Wikimedia Tool Labs Annual Survey (2016)

Bryan Davis bd808 at wikimedia.org
Sun Nov 20 01:10:53 UTC 2016


On Tue, Nov 1, 2016 at 4:35 PM, Gryllida <gryllida at fastmail.fm> wrote:
> Bryan Davis wrote:
>> Gryllida wrote:
>> > 7) Other comments: I use jsub. Is it the same thing as GridEngine
>> > Continuous jobs in the question below?
>>
>> Possibly, yes. `jsub -continuous` and the `jstart` convenience alias
>> are the things that run continuous jobs on the job grid. This
>> functionality is a custom addition to open grid engine that our
>> wrapper scripts provide. It sounds like we could make that question a
>> bit more clear if it stays on the survey for next year. I think we
>> actually have some logging today that can answer that particular
>> question even for people who aren't consciously aware that they are
>> using it, so it may be better to just drop the question.
>
> What is the difference between jsub-continuous and jsub?
>
> Does any of them restart the job automatically if the cluster is
> rebooted?

Apologies for not responding to this sooner. I flagged it as needing a
response but then apparently didn't actually respond. :/

`jsub -continuous` does a couple of things:
* It enables the same duplicate job protection as `jsub -once` that
prevents multiple jobs with the name job name from running.
* It wraps your command in a bash while loop that will keep running
the submitted program until it exits with a 0 exit status. [0]

`jstart` is an alias for `jsub -continuous -once` [1]. (Which is
redundant because -continuous also implies -once. Maybe I should fix
the code, but either way the end result is the same.)

Neither `jsub -continuous` nor `jstart` by itself will resubmit a job
to the grid when cluster reboots have caused the original job to exit.
The Bigbrother watchdog process [2] can resubmit jobs for you however
if you have a job that should always be running on the grid. It does
this by reading a $HOME/.bigbrotherrc configuration file in your
tool's home directory and checking the grid for the jobs it declares.
If a job is not found to be actively running on the grid, Bigbrother
will submit it for you using the `jstart -N $NAME ...` command from
the config file.


[0]: https://phabricator.wikimedia.org/diffusion/LTOL/browse/master/jobutils/bin/jsub;99e9eac0ee58f516c32a607a5e5f5e79bedca484$645-659
[1]: https://phabricator.wikimedia.org/diffusion/LTOL/browse/master/jobutils/bin/jsub;99e9eac0ee58f516c32a607a5e5f5e79bedca484$566-568
[2]: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Bigbrother

Bryan
-- 
Bryan Davis              Wikimedia Foundation    <bd808 at wikimedia.org>
[[m:User:BDavis_(WMF)]]  Sr Software Engineer            Boise, ID USA
irc: bd808                                        v:415.839.6885 x6855



More information about the Labs-l mailing list