My tool went down after 3 days on Labs https://www.statuscake.com/App/AllStatus.php?tid=1334098
I think it will require https://phabricator.wikimedia.org/T142164 Build replacement for the webservice toolschecker test
On WMUA's server I think I solved this with the monit service, the was no downtime in 5 days according to statuscake checks (each 5 minutes)

On Sun, Aug 7, 2016 at 8:11 AM, Yuvi Panda <yuvipanda@gmail.com> wrote:
On Wed, Aug 3, 2016 at 9:12 PM, Lily <lilyofthewest.wikimedia@gmail.com> wrote:
> + Yuvi
>
> On Wed, Aug 3, 2016 at 3:31 AM, Alexander Tsirlin <altsirlin@gmail.com>
> wrote:
>>
>> Dear Ilya, Leila, and others,
>>
>> Thank you for your response. Unfortunately, I am not programmer, and I
>> can't help you with the development. I only know that Tools had severe
>> problems in the past, and it may not be a very reliable solution because, if
>> something happens to the server, you won't have direct control and won't be
>> able to solve the problem quickly. I had enough trouble when map scripts for
>> Wikivoyage were running there.
>
>
> Yuvi, what kind of guarantee can we have for Tools' availability?

It has definitely been more stable than in the past by quite a margin,
I think. However, we do have some maintenance planned (not announced
yet) for end of Aug / beginning of September related to NFS that could
cause some instabilities. In light of that, I think the following is a
reasonable course of action to follow:

1. Identify critical 'must have' tools (not too many!)
2. Set them up in a labs instances that doesn't have any NFS (I can
help do this)
3. When we do the planned maintenance, re-route the tools to hit the
labs instances setup in (2) (I can help do this a well).

Now all we need is a *super* minimal (maybe 2?) of tools that should
be considered absolutely essential for September (I expect everyting
to be stable by October), so we can setup redundancies for that tool.

--
Yuvi Panda T
http://yuvi.in/blog