Yesterday, I reported some log messages in T318479 https://phabricator.wikimedia.org/T318479:
logs/django/django.log.2022-09-27:2022-09-28 16:43:39,873 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: IndexView() logs/django/django.log.2022-09-27:2022-09-28 16:59:18,903 [76e999afc82c10fb99b6c9bf76448d1a] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out') logs/django/django.log.2022-09-27:2022-09-28 16:59:19,196 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: request took 0:15:39.323408
When I went to look at this again today, the messages were gone. After a bit of head-scratching, I discovered they were now in a .nfs file:
(venv) spi-tools-dev [django] grep 76e999afc82c10fb99b6c9bf76448d1a .nfs0000000005f910c800000388 2022-09-28 16:43:39,873 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: IndexView() 2022-09-28 16:59:18,903 [76e999afc82c10fb99b6c9bf76448d1a] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out') 2022-09-28 16:59:19,196 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: request took 0:15:39.323408
These log files are created by Python's TimedRotatingFileHandler. So it looks like something was holding the file open at the time it was rotated. In theory, I should be able to find what process has them open using lsof, but that doesn't work when I run it on tools-sgebastion-11:
lsof .nfs0000000005f910c800000388 lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing Output information may be incomplete.
and if I shell into the krb instance, I just get:
bash: lsof: command not found
So how do I figure out what's going on?