[Labs-l] Webservice

Tim Landscheidt tim at tim-landscheidt.de
Thu Jul 10 15:34:05 UTC 2014


Magnus Manske <magnusmanske at googlemail.com> wrote:

> I've been manually restarting about a dozen webservices for my tools in the
> last 24h.

> And before you say it, some of those were Hedonil's hand-rolled webservice.

> Could we PLEASE either have a Labs-official, auto- and self-restarting
> webservice, or something a little more stable than lighttpd (or a more
> stable way to run it)?

I looked at all the tools you are a developer of and I as-
sume you speak about wikidata-todo.  This has some logs that
appear to have indications of OOM shutdowns.

You use a custom lighttpd configuration, and I'm not sure if
the decision to have two PHP FCGIs doubles the memory re-
quirements, at the moment using 6 GBytes out of 7 GBytes re-
quested.

What is clear however is that your PHP script:

| 2014-07-10 14:11:39: (mod_fastcgi.c.2701) FastCGI-stderr: PHP Fatal error:  Allowed memory size of 2621440000 bytes exhausted (tried to allocate 71 bytes) in /data/project/wikidata-todo/public_html/autolist2.php on line 201

uses almost 2.5 GByte of memory -- if I don't misread the
documentation -- per /request/.

Memory is cheap and we could just increase the requested
limit, but I assume there are some PHP developers around who
might want to have a poke at optimizing
<https://bitbucket.org/magnusmanske/wikidata-todo/src/master/public_html/autolist2.php>.

Regarding self-restarting web services, with continuous jobs
we have a "while ! $JOB; do sleep 5; done" loop that ensures
that the job is restarted if it aborts.  This however does
not work on OOMs that are the predominant cause of webser-
vice shutdowns, as the grid engine will kill the loop as
well :-).  So we will probably have to start the webservice
and then start a watchdog job with the webservice's job num-
ber as its parameter that periodically checks that the web-
service is still running and, in case, restarts the webser-
vice.  But to do that, jobs on execution nodes need to be
able to submit jobs, and this is still pending
(cf. https://bugzilla.wikimedia.org/54786).

Tim




More information about the Labs-l mailing list