On Wed, Nov 17, 2021 at 1:04 AM YiFei Zhu zhuyifei1999@gmail.com wrote:
On Tue, Nov 16, 2021 at 6:38 PM Huji Lee huji.huji@gmail.com wrote:
I went back and reactivated the line in .bash_profile which enabled zsh ("exec zsh" as the last line of .bash_profile)
Then I submitted the job to the grid, using a command like this:
jsub -N "n" -once -o ~/err/nightly.out -e ~/err/nightly.err ~/grid/jobs/nightly.sh
I did it three ways. First, I used the nightly.sh file as is (see source). Second, I replaced "source" with "." and third I replaced "source" with "bash". In all three cases, it failed, without even producing an output or error. The nightly.out and nightly.err files were created of course, but were empty.
Next, I added a "#!/bin/bash" shabang and ran it again all three ways. Result was the same.
Running qstat many times shows that the job gets into a queued state ("qw") and after a few seconds, it goes into the run state ("r") and immediately stops.
Removing the "exec zsh" command from .bash_profile will make things work again.
Finally, I decided maybe the problem is that zsh is available for me, but not on the grid. So I change the .bash_profile ending from a single "exec zsh" command to this:
if [ -f /usr/bin/zsh ]; then zsh fi
Under this config, jobs on the grid worked, and when I used "become" to login as my tool, I ended with zsh. Obviously, I am happy with this workaround. But I am still curious as to the root cause.
Is it really that zsh is not available on the grid, and the grid tries to replicate my environment first and reaches the "exec zsh" command and falls apart somehow?
This is consistent with what I described earlier:
Since you have "exec zsh" in your .bash_profile, bash will run it as startup as a login shell, which in theory would immediately replace itself with zsh with no arguments. zsh will then see it has no arguments, attempts to read script from stdin and get nothing, and immediately exit, stopping the job in grid.
However, now that you have "zsh" instead of "exec zsh", the "replace" is not done. bash as the login shell executes zsh as a subshell, and zsh, having no inputs, immediately exits. The execution continues as if nothing had ever happened.
I just tested the behavior of a how bash invokes .bash_profile by adding a sleep 60 to .bash_profile, and have my test.sh have a shebang, a a job is submitted for both with explicit 'bash' and without, and it looks like .bash_profile is executed in bath cases:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sgeadmin 762 0.4 0.1 111020 16056 ? Sl Mar25 1383:08 /usr/lib/gridengine/sge_execd [...] sgeadmin 20388 0.0 0.1 51468 8540 ? S 07:57 0:00 _ /usr/lib/gridengine/sge_shepherd -bg tools.z+ 20390 0.0 0.0 23580 3196 ? Ss 07:57 0:00 _ -bash -c /data/project/zhuyifei1999-test/test.sh tools.z+ 20393 0.0 0.0 5796 672 ? S 07:57 0:00 _ sleep 60
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND sgeadmin 752 0.3 0.1 115112 16100 ? Sl Mar25 1313:16 /usr/lib/gridengine/sge_execd [...] sgeadmin 8715 0.0 0.1 51468 8688 ? S 07:57 0:00 _ /usr/lib/gridengine/sge_shepherd -bg tools.z+ 8717 0.0 0.0 23580 3324 ? Ss 07:57 0:00 _ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh tools.z+ 8720 0.0 0.0 5796 656 ? S 07:57 0:00 _ sleep 60
It did take me by surprise that it's still bash that invokes the given command, because bash was not in the process tree for a usual "jsub [...] python script.sh". For example, a non-continuous job typically looks like this:
sgeadmin 28386 0.0 0.1 51468 8588 ? S Nov15 0:00 _ /usr/lib/gridengine/sge_shepherd -bg tools.f+ 28388 7.2 3.5 427144 293024 ? Ss Nov15 210:55 | _ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10
And a continuous one:
sgeadmin 3699 0.0 0.0 51464 4540 ? S Apr19 0:00 _ /usr/lib/gridengine/sge_shepherd -bg tools.b+ 3701 0.0 0.0 4280 68 ? SNs Apr19 0:00 | _ /bin/sh /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451 tools.b+ 3702 0.2 2.8 505104 231092 ? SNl Apr19 674:45 | _ /usr/bin/python bot2.py
There is no `-bash -c "python script.sh"`
However, if you trace what's going on, for a non-interactive bash that only receives a single command, it will directly execve that command:
$ strace -e clone,execve bash -c '/bin/true' execve("/bin/bash", ["bash", "-c", "/bin/true"], [/* 26 vars */]) = 0 execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0 +++ exited with 0 +++
It does not involve child processes from the fork-exec model you'd expect. Therefore, we can say that no matter what you do with the job submission, a bash non-interactive login shell will be executed to run the command you specified to jsub. And the mess of "bash replace itself with zsh which immediately exits because stdin is empty" will apply.
I think it is important to clarify that a shell like bash has 4 modes of execution, defined by whether it is an interactive shell, and whether it is a login shell. The details for the modes in the case of bash you can find in its man page [1]. But tl;dr:
Login shells:
- Upon startup, sources /etc/profile, then the first one among
~/.bash_profile, ~/.bash_login, and ~/.profile, that exists.
- `bash -l` and `-bash` (note the dash sign at the front) makes bash a
login shell
Non-login shells:
- If also interactive, upon startup, sources ~/.bashrc
Interactive shells:
- DIsplays a prompt for each command
Non-interactive shells:
- Upon startup, sources $BASH_ENV if it exists
- As we saw above, if the command is given in the command string in -c
and there is only one command, bash does not fork-exec the command but execs the command directly.
So you might wonder why the separation of login shells (profile) vs non-login shells (rc). The reason is some environments are inherited by subshells while others are not. Environment variables are inherited:
$ export FOO=bar $ echo $FOO bar $ bash $ echo $FOO bar
While things like aliases are not:
$ alias foo='echo bar' $ foo bar $ bash $ foo bash: foo: command not found
There are environment setups that get inherited but you do not want it to be executed over and over by subshells. For example, appending to $PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of profile, every time you run an interactive bash subshell PATH gets longer and more redundant; hence $PATH setups normally go to profile instead of rc. Non-inheritable setups like aliases go to rc. And the separation between .bash_profile and .profile is just so that you can have a .bash_profile that uses bash-specific syntax. I never needed any so I always use .profile.
And to have bash login shells also get the initialization from rc, .profile usually has a header like this:
# if running bash if [ -n "$BASH_VERSION" ]; then # include .bashrc if it exists if [ -f "$HOME/.bashrc" ]; then . "$HOME/.bashrc" fi fi
And .bashrc:
# Test for an interactive shell if [[ $- != *i* ]] ; then # Shell is non-interactive. Be done now! return fi
I hope this makes sense. Let me know if not.
Back to your question, let's see in what scenarios you would want to invoke zsh:
- Non-interactive shells: No, you don't want `bash command.sh` randomly exec zsh
- Interactive non-login shells: No, if you explicitly run `bash`, you
want bash not zsh.
- Interactive login shells. Yes, this is what `become tool` runs
initially and you want bash here.
Hence, to run in a login shell environment you'd want the .profile or .bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash syntax, so what you want, expressed in code, is in .bash_profile:
if [[ $- = *i* ]]; then exec zsh fi
As a side note, yes zsh exists on the grid hosts:
zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh -rwxr-xr-x 1 root root 819744 Dec 1 2020 /bin/zsh lrwxrwxrwx 1 root root 8 Nov 22 2018 /usr/bin/zsh -> /bin/zsh
[1] https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION
YiFei Zhu
Have you had a chance to take a look at it yet?
YiFei Zhu