I went back and reactivated the line in .bash_profile which enabled zsh ("exec zsh" as the last line of .bash_profile)

Then I submitted the job to the grid, using a command like this:

jsub -N "n"  -once -o ~/err/nightly.out -e ~/err/nightly.err ~/grid/jobs/nightly.sh

I did it three ways. First, I used the nightly.sh file as is (see source). Second, I replaced "source" with "." and third I replaced "source" with "bash". In all three cases, it failed, without even producing an output or error. The nightly.out and nightly.err files were created of course, but were empty.

Next, I added a "#!/bin/bash" shabang and ran it again all three ways. Result was the same.

Running qstat many times shows that the job gets into a queued state ("qw") and after a few seconds, it goes into the run state ("r") and immediately stops.

Removing the "exec zsh" command from .bash_profile will make things work again.

Finally, I decided maybe the problem is that zsh is available for me, but not on the grid. So I change the .bash_profile ending from a single "exec zsh" command to this:

if [ -f /usr/bin/zsh ]; then

Under this config, jobs on the grid worked, and when I used "become" to login as my tool, I ended with zsh. Obviously, I am happy with this workaround. But I am still curious as to the root cause.

Is it really that zsh is not available on the grid, and the grid tries to replicate my environment first and reaches the "exec zsh" command and falls apart somehow?

On Sun, Nov 14, 2021 at 10:54 AM Huji Lee <huji.huji@gmail.com> wrote:
Again, good advice by both of you! Let me explore more and get back with any potential questions.

On Sun, Nov 14, 2021 at 9:24 AM Roy Smith <roy@panix.com> wrote:
This is really good advice.  Any time you've got a process being run by some tool on your behalf (cron, initd, remote job execution, etc), you're running in an alien environment.  You get so used to things "just working" when you run them interactively, you forget how much of your carefully crafted login environment you're depending on.  The more you make things totally explicit, the less chance there is for things to go south in a hard-to-debug way.

On Nov 14, 2021, at 8:47 AM, YiFei Zhu <zhuyifei1999@gmail.com> wrote:

Honestly, I think you should not depend on the behavior of
shebang-less scripts as the executable. You should either put "bash
/path/to/scriptfile.sh" or add a shebang to top of the script.

Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/