[Labs-admin] ** RECOVERY alert - ToolLabs/ToolLabs Home Page is OK **
Andrew Bogott
abogott at wikimedia.org
Sun Nov 27 07:11:12 UTC 2016
I poked at this, but I'm pretty sure it recovered on its own. When I
tried to restart the service by hand, I got this:
Traceback (most recent call last):
File "/usr/bin/webservice-runner", line 27, in <module>
webservice.run(port)
File
"/usr/lib/python2.7/dist-packages/toollabs/webservice/services/lighttpdwebservice.py",
line 108, in run
with open(config_path, 'w') as f:
IOError: [Errno 13] Permission denied: '/var/run/lighttpd/admin'
It's late and this is half-baked (and my attempts to fix the problem
destroyed the evidence) but my speculation is that in some situations we
are 'leaking' read-only /var/run/lighttpd/admin files. Once one of them
is out there, each time the webservice restarts it's just the luck of
the draw whether we hit an exec host that has or doesn't have a
read-only file, so the failure is intermittent.
For now, I've explicitly removed that file on all trusty lighttpd
hosts. When/if this problem recurs we should check the writeability of
the complaining file before doing anything else.
-A
On 11/27/16 1:05 AM, shinken wrote:
> Notification Type: RECOVERY
>
> Service: ToolLabs Home Page
> Host: ToolLabs
> Address: tools.wmflabs.org
> State: OK
>
> Date/Time: Sun 27 Nov 07:05:01 UTC 2016
>
> Additional Info:
>
> HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.041 second response time
More information about the Labs-admin
mailing list