I went back and reactivated the line in .bash_profile which enabled zsh ("exec zsh" as the last line of .bash_profile)
Then I submitted the job to the grid, using a command like this:
jsub -N "n" -once -o ~/err/nightly.out -e ~/err/nightly.err ~/grid/jobs/nightly.sh
I did it three ways. First, I used the nightly.sh file as is (
see source). Second, I replaced "source" with "." and third I replaced "source" with "bash". In all three cases, it failed, without even producing an output or error. The nightly.out and nightly.err files were created of course, but were empty.
Next, I added a "#!/bin/bash" shabang and ran it again all three ways. Result was the same.
Running qstat many times shows that the job gets into a queued state ("qw") and after a few seconds, it goes into the run state ("r") and immediately stops.
Removing the "exec zsh" command from .bash_profile will make things work again.
Finally, I decided maybe the problem is that zsh is available for me, but not on the grid. So I change the .bash_profile ending from a single "exec zsh" command to this:
if [ -f /usr/bin/zsh ]; then
zsh
fi
Under this config, jobs on the grid worked, and when I used "become" to login as my tool, I ended with zsh. Obviously, I am happy with this workaround. But I am still curious as to the root cause.
Is it really that zsh is not available on the grid, and the grid tries to replicate my environment first and reaches the "exec zsh" command and falls apart somehow?