Hmmm. So I assume what's being reported here is whatever your app emits on stderr?
It sounds like (at least for now) if you want both real-time visibility into log messages
and durable storage for post-mortem analysis, the right thing to do configure your app to
log to both stderr and to a file on NFS?
On Oct 5, 2023, at 3:07 PM, Taavi Väänänen
<taavi(a)wikimedia.org> wrote:
Yes, currently the logs subcommand in webservice and toolforge-jobs tools can only read
logs from a currently running tool, and logs are lost when a service crashes or is stopped
or restarted.
For some backstory: As you know, the current NFS storage setup we currently use for both
tool code and logs is a major pain to keep up and running, in addition to having a very
poor user experience when trying to follow logs in real-time. The new build service
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Build_Service> moves tool code
off NFS which unblocks running a tool without any NFS mounts. The current versions of the
logs subcommands are meant to provide at least some way to read logs for these early
off-NFS tools - it's essentially a fancy wrapper for `kubectl logs` at this point. Of
course there are some tools that need longer log retention, but for simple ones (like
db-names <https://db-names.toolforge.org/> which I'm using to test many new
buildservice features) it's perfectly usable.
The idea is to swap the commands to use a better log management system once we have
deployed one to Toolforge. The good news is that one of the major blockers for that (lack
of object storage) is almost solved, so I'm hoping to see some movement for that
project fairly soon (although, as usual, it's really hard to promise anything). I
expect that project will be tracked in subtasks of T127367
<https://phabricator.wikimedia.org/T127367> once there's anything beyond a very
rough idea.
Taavi
On Thu, Oct 5, 2023 at 9:10 PM Roy Smith <roy(a)panix.com
<mailto:roy@panix.com>> wrote:
I see T336057 was just closed. Looking at the docs
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#View_web_service_logs>,
I'm unclear how this works. The docs say ", the output from the webservice
command is stored by the Toolforge Kubernetes infrastructure as long as the web service is
running." So, what happens when a service exits (i.e. crashes)? Does that mean the
logs for that service disappear?
_______________________________________________
Cloud mailing list -- cloud(a)lists.wikimedia.org <mailto:cloud@lists.wikimedia.org>
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
<https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/>
--
Taavi Väänänen (he/him)
Site Reliability Engineer, Cloud Services
Wikimedia Foundation
_______________________________________________
Cloud mailing list -- cloud(a)lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/