Thank you very much for your reply and explanation. It's PWB bot actually, which is based on Python.
And yeah, I should send such mails to the list instead of Bryan. So I'm CCing the list, so the reply is preserved forever.
Thank you again, Martin
Dne po 17. zář 2018 22:39 uživatel Bryan Davis bdavis@wikimedia.org napsal:
On Sun, Sep 16, 2018 at 11:50 PM, Martin Urbanec martin.urbanec@wikimedia.cz wrote:
Hi, STDOUT/STDERR from jobs is being added to jobname.{out,err} files
when
the job is ended. Is it possible to monitor the output while the job is running?
I believe that the logging of stdout and stderr is actually continuous from the point of view of Grid Engine, but there are a couple of things that may make it seem to be batched from the point of view of you observing the files:
- NFS server write aggregation and NFS client inode status updates
cause a several seconds to several minutes delay in a write by Grid Engine to a log file being visible from a Toolforge bastion where you are trying to read it.
- Your application may actually be adding additional buffering of
writes to stdout/stderr that are not flushed through to Grid Engine to send on to the NFS hosted log files until program completion (or the buffer filling up). One common place that I have personally seen this is with Python scripts. See https://stackoverflow.com/questions/107705/disable-output-buffering for some ideas on how to work around this if it is a problem for you. If you do have a python script that does a lot of stdout/stderr output completely disabling write buffering may have negative impacts on shared NFS server performance. Finding a way to line buffer (hang on to writes until you see a newline) will probably be much less impactful on our limited shared resources than trying to write each byte independently.
Direct inspection of the raw stdout/stderr streams of a process running under Grid Engine's control is technically possible, but would require root permissions typically. You would have to know what exec host the job is running on, ssh to that host, and then use `strace` or a similar debugging tool to inspect the stream. This is not something that a normal Toolforge user can accomplish themselves.
Questions like this make great topics for the cloud@lists.wikimedia.org mailing list. You can get help there not just from me, but also from the larger Toolforge user community. It also makes the responses easier for others to find when they have similar issues.
Bryan
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA irc: bd808 v:415.839.6885 x6855