[Labs-l] Debugging lighttpd OOM's

Merlijn van Deen valhallasw at arctus.nl
Tue Jun 10 09:28:08 UTC 2014


Hello all,

My 'tsreports' webservice randomly dies every now and then. qacct suggests
this is due to OOM:

tools.tsreports at tools-login:~$ qacct -j 487745
qname        webgrid-lighttpd
(...)
jobname      lighttpd-tsreports
jobnumber    487745
(...)
qsub_time    Wed Apr 23 08:18:12 2014
start_time   Fri May 23 14:30:17 2014
end_time     Fri Jun  6 10:51:21 2014
(...)
failed       0
exit_status  0
(...)
maxvmem      3.973G


I have no clue how to debug this, though; the lighttpd error log just shows

2014-06-06 10:51:20: (mod_fastcgi.c.3061) got proc: pid: 12119 socket:
unix:/tmp/tsreports-index.fcgi.sock-0 load: 1
2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
2014-06-06 10:51:20: (server.c.1502) unlink failed for:
/var/run/lighttpd/tsreports.pid 2 No such file or directory
2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
2014-06-06 10:51:20: (server.c.1502) unlink failed for:
/var/run/lighttpd/tsreports.pid 2 No such file or directory
2014-06-06 10:51:20: (server.c.1502) unlink failed for:
/var/run/lighttpd/tsreports.pid 2 No such file or directory
2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087
2014-06-06 10:51:21: (server.c.1502) unlink failed for:
/var/run/lighttpd/tsreports.pid 2 No such file or directory
2014-06-06 10:51:21: (server.c.1512) server stopped by UID = 0 PID = 12087
2014-06-06 10:51:20: (server.c.1512) server stopped by UID = 0 PID = 12087

which is not very informative, to say the least.

So: how can one debug these issues?

To add insult to the injury, SGE doesn't even send an e-mail to tell me it
killed the webserver, nor does it re-start the webserver. Either of those
would be reasonable (especially the option 'restart the webserver'). Now I
had to be notified by someone on my talk page...

Merlijn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/labs-l/attachments/20140610/e4c9e8c0/attachment.html>


More information about the Labs-l mailing list