On Wed, Nov 17, 2021 at 1:04 AM YiFei Zhu <zhuyifei1999(a)gmail.com> wrote:
On Tue, Nov 16, 2021 at 6:38 PM Huji Lee
<huji.huji(a)gmail.com> wrote:
I went back and reactivated the line in .bash_profile which enabled zsh ("exec
zsh" as the last line of .bash_profile)
Then I submitted the job to the grid, using a command like this:
jsub -N "n" -once -o ~/err/nightly.out -e ~/err/nightly.err
~/grid/jobs/nightly.sh
I did it three ways. First, I used the nightly.sh file as is (see source). Second, I
replaced "source" with "." and third I replaced "source"
with "bash". In all three cases, it failed, without even producing an output or
error. The nightly.out and nightly.err files were created of course, but were empty.
Next, I added a "#!/bin/bash" shabang and ran it again all three ways. Result
was the same.
Running qstat many times shows that the job gets into a queued state ("qw") and
after a few seconds, it goes into the run state ("r") and immediately stops.
Removing the "exec zsh" command from .bash_profile will make things work
again.
Finally, I decided maybe the problem is that zsh is available for me, but not on the
grid. So I change the .bash_profile ending from a single "exec zsh" command to
this:
if [ -f /usr/bin/zsh ]; then
zsh
fi
Under this config, jobs on the grid worked, and when I used "become" to login
as my tool, I ended with zsh. Obviously, I am happy with this workaround. But I am still
curious as to the root cause.
Is it really that zsh is not available on the grid, and the grid tries to replicate my
environment first and reaches the "exec zsh" command and falls apart somehow?
This is consistent with what I described earlier:
Since you have "exec zsh" in your
.bash_profile, bash will run it as startup as a login shell, which in
theory would immediately replace itself with zsh with no arguments.
zsh will then see it has no arguments, attempts to read script from
stdin and get nothing, and immediately exit, stopping the job in grid.
However, now that you have "zsh" instead of "exec zsh", the
"replace"
is not done. bash as the login shell executes zsh as a subshell, and
zsh, having no inputs, immediately exits. The execution continues as
if nothing had ever happened.
I just tested the behavior of a how bash invokes .bash_profile by
adding a sleep 60 to .bash_profile, and have my test.sh have a
shebang, a a job is submitted for both with explicit 'bash' and
without, and it looks like .bash_profile is executed in bath cases:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
sgeadmin 762 0.4 0.1 111020 16056 ? Sl Mar25 1383:08
/usr/lib/gridengine/sge_execd
[...]
sgeadmin 20388 0.0 0.1 51468 8540 ? S 07:57 0:00 \_
/usr/lib/gridengine/sge_shepherd -bg
tools.z+ 20390 0.0 0.0 23580 3196 ? Ss 07:57 0:00
\_ -bash -c /data/project/zhuyifei1999-test/test.sh
tools.z+ 20393 0.0 0.0 5796 672 ? S 07:57 0:00
\_ sleep 60
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
sgeadmin 752 0.3 0.1 115112 16100 ? Sl Mar25 1313:16
/usr/lib/gridengine/sge_execd
[...]
sgeadmin 8715 0.0 0.1 51468 8688 ? S 07:57 0:00 \_
/usr/lib/gridengine/sge_shepherd -bg
tools.z+ 8717 0.0 0.0 23580 3324 ? Ss 07:57 0:00
\_ -bash -c /bin/bash /data/project/zhuyifei1999-test/test.sh
tools.z+ 8720 0.0 0.0 5796 656 ? S 07:57 0:00
\_ sleep 60
It did take me by surprise that it's still bash that invokes the given
command, because bash was not in the process tree for a usual "jsub
[...] python script.sh". For example, a non-continuous job typically
looks like this:
sgeadmin 28386 0.0 0.1 51468 8588 ? S Nov15 0:00 \_
/usr/lib/gridengine/sge_shepherd -bg
tools.f+ 28388 7.2 3.5 427144 293024 ? Ss Nov15 210:55 |
\_ /usr/bin/python pycore/pwb.py pycore/fawikibot/rade.py -newcat:10
And a continuous one:
sgeadmin 3699 0.0 0.0 51464 4540 ? S Apr19 0:00 \_
/usr/lib/gridengine/sge_shepherd -bg
tools.b+ 3701 0.0 0.0 4280 68 ? SNs Apr19 0:00 |
\_ /bin/sh /var/spool/gridengine/execd/tools-sgeexec-0942/job_scripts/1302451
tools.b+ 3702 0.2 2.8 505104 231092 ? SNl Apr19 674:45 |
\_ /usr/bin/python bot2.py
There is no `-bash -c "python script.sh"`
However, if you trace what's going on, for a non-interactive bash that
only receives a single command, it will directly execve that command:
$ strace -e clone,execve bash -c '/bin/true'
execve("/bin/bash", ["bash", "-c",
"/bin/true"], [/* 26 vars */]) = 0
execve("/bin/true", ["/bin/true"], [/* 25 vars */]) = 0
+++ exited with 0 +++
It does not involve child processes from the fork-exec model you'd
expect. Therefore, we can say that no matter what you do with the job
submission, a bash non-interactive login shell will be executed to run
the command you specified to jsub. And the mess of "bash replace
itself with zsh which immediately exits because stdin is empty" will
apply.
I think it is important to clarify that a shell like bash has 4 modes
of execution, defined by whether it is an interactive shell, and
whether it is a login shell. The details for the modes in the case of
bash you can find in its man page [1]. But tl;dr:
Login shells:
- Upon startup, sources /etc/profile, then the first one among
~/.bash_profile, ~/.bash_login, and ~/.profile, that exists.
- `bash -l` and `-bash` (note the dash sign at the front) makes bash a
login shell
Non-login shells:
- If also interactive, upon startup, sources ~/.bashrc
Interactive shells:
- DIsplays a prompt for each command
Non-interactive shells:
- Upon startup, sources $BASH_ENV if it exists
- As we saw above, if the command is given in the command string in -c
and there is only one command, bash does not fork-exec the command but
execs the command directly.
So you might wonder why the separation of login shells (profile) vs
non-login shells (rc). The reason is some environments are inherited
by subshells while others are not. Environment variables are
inherited:
$ export FOO=bar
$ echo $FOO
bar
$ bash
$ echo $FOO
bar
While things like aliases are not:
$ alias foo='echo bar'
$ foo
bar
$ bash
$ foo
bash: foo: command not found
There are environment setups that get inherited but you do not want it
to be executed over and over by subshells. For example, appending to
$PATH (`export PATH="$PATH:/path/to/bin"`). If it is in rc instead of
profile, every time you run an interactive bash subshell PATH gets
longer and more redundant; hence $PATH setups normally go to profile
instead of rc. Non-inheritable setups like aliases go to rc. And the
separation between .bash_profile and .profile is just so that you can
have a .bash_profile that uses bash-specific syntax. I never needed
any so I always use .profile.
And to have bash login shells also get the initialization from rc,
.profile usually has a header like this:
# if running bash
if [ -n "$BASH_VERSION" ]; then
# include .bashrc if it exists
if [ -f "$HOME/.bashrc" ]; then
. "$HOME/.bashrc"
fi
fi
And .bashrc:
# Test for an interactive shell
if [[ $- != *i* ]] ; then
# Shell is non-interactive. Be done now!
return
fi
I hope this makes sense. Let me know if not.
Back to your question, let's see in what scenarios you would want to invoke zsh:
- Non-interactive shells: No, you don't want `bash command.sh` randomly exec zsh
- Interactive non-login shells: No, if you explicitly run `bash`, you
want bash not zsh.
- Interactive login shells. Yes, this is what `become tool` runs
initially and you want bash here.
Hence, to run in a login shell environment you'd want the .profile or
.bash_profile. And interactive guard is simply [[ $- = *i* ]] in bash
syntax, so what you want, expressed in code, is in .bash_profile:
if [[ $- = *i* ]]; then
exec zsh
fi
As a side note, yes zsh exists on the grid hosts:
zhuyifei1999@tools-sgeexec-0901: ~$ ls -l {/usr,}/bin/zsh
-rwxr-xr-x 1 root root 819744 Dec 1 2020 /bin/zsh
lrwxrwxrwx 1 root root 8 Nov 22 2018 /usr/bin/zsh -> /bin/zsh
[1]
https://man7.org/linux/man-pages/man1/bash.1.html#INVOCATION
YiFei Zhu