Hmmm.  So I assume what's being reported here is whatever your app emits on stderr?

It sounds like (at least for now) if you want both real-time visibility into log messages and durable storage for post-mortem analysis, the right thing to do configure your app to log to both stderr and to a file on NFS?

On Oct 5, 2023, at 3:07 PM, Taavi Väänänen <taavi@wikimedia.org> wrote:

Yes, currently the logs subcommand in webservice and toolforge-jobs tools can only read logs from a currently running tool, and logs are lost when a service crashes or is stopped or restarted.

For some backstory: As you know, the current NFS storage setup we currently use for both tool code and logs is a major pain to keep up and running, in addition to having a very poor user experience when trying to follow logs in real-time. The new build service moves tool code off NFS which unblocks running a tool without any NFS mounts. The current versions of the logs subcommands are meant to provide at least some way to read logs for these early off-NFS tools - it's essentially a fancy wrapper for `kubectl logs` at this point. Of course there are some tools that need longer log retention, but for simple ones (like db-names which I'm using to test many new buildservice features) it's perfectly usable.

The idea is to swap the commands to use a better log management system once we have deployed one to Toolforge. The good news is that one of the major blockers for that (lack of object storage) is almost solved, so I'm hoping to see some movement for that project fairly soon (although, as usual, it's really hard to promise anything). I expect that project will be tracked in subtasks of T127367 once there's anything beyond a very rough idea.

Taavi

On Thu, Oct 5, 2023 at 9:10 PM Roy Smith <roy@panix.com> wrote:
I see T336057 was just closed.  Looking at the docs, I'm unclear how this works.  The docs say ", the output from the webservice command is stored by the Toolforge Kubernetes infrastructure as long as the web service is running."  So, what happens when a service exits (i.e. crashes)?  Does that mean the logs for that service disappear?

_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/


--
Taavi Väänänen (he/him)
Site Reliability Engineer, Cloud Services
Wikimedia Foundation
_______________________________________________
Cloud mailing list -- cloud@lists.wikimedia.org
List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/