Thank you very much for your reply and explanation. It's PWB bot actually,
which is based on Python.
And yeah, I should send such mails to the list instead of Bryan. So I'm
CCing the list, so the reply is preserved forever.
Thank you again,
Martin
Dne po 17. zář 2018 22:39 uživatel Bryan Davis <bdavis(a)wikimedia.org>
napsal:
On Sun, Sep 16, 2018 at 11:50 PM, Martin Urbanec
<martin.urbanec(a)wikimedia.cz> wrote:
Hi, STDOUT/STDERR from jobs is being added to
jobname.{out,err} files
when
the job is ended. Is it possible to monitor the
output while the job is
running?
I believe that the logging of stdout and stderr is actually continuous
from the point of view of Grid Engine, but there are a couple of
things that may make it seem to be batched from the point of view of
you observing the files:
* NFS server write aggregation and NFS client inode status updates
cause a several seconds to several minutes delay in a write by Grid
Engine to a log file being visible from a Toolforge bastion where you
are trying to read it.
* Your application may actually be adding additional buffering of
writes to stdout/stderr that are not flushed through to Grid Engine to
send on to the NFS hosted log files until program completion (or the
buffer filling up). One common place that I have personally seen this
is with Python scripts. See
https://stackoverflow.com/questions/107705/disable-output-buffering
for some ideas on how to work around this if it is a problem for you.
If you do have a python script that does a lot of stdout/stderr output
completely disabling write buffering may have negative impacts on
shared NFS server performance. Finding a way to line buffer (hang on
to writes until you see a newline) will probably be much less
impactful on our limited shared resources than trying to write each
byte independently.
Direct inspection of the raw stdout/stderr streams of a process
running under Grid Engine's control is technically possible, but would
require root permissions typically. You would have to know what exec
host the job is running on, ssh to that host, and then use `strace` or
a similar debugging tool to inspect the stream. This is not something
that a normal Toolforge user can accomplish themselves.
Questions like this make great topics for the
cloud(a)lists.wikimedia.org mailing list. You can get help there not
just from me, but also from the larger Toolforge user community. It
also makes the responses easier for others to find when they have
similar issues.
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855