Hi all!
Good news, we have enabled health checks for all the webservices running on toolforge.
There's no action required on your part, the next time you restart or stop/start your webservice, it will have a tcp health check by default (just making sure something is listening).
The most interesting feature though is being able to pass a url to use as HTTP health check.
To do so you can pass `--health-check-url /path/to/health` to your `toolforge webservice start` command, and toolforge will automatically restart your webservice if it stops responding to that path (you can change the path to whatever you want, ex. `/`).
Note that this url will be queried quite often, so try to avoid hitting a page that uses many resources.
Also a reminder that you can find this and smaller user-facing updates about the Toolforge platform features here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog
Original task: https://phabricator.wikimedia.org/T341919
Cheers!
David Caro (2024-02-29 13:57):
[...]
The most interesting feature though is being able to pass a url to use as HTTP health check.
To do so you can pass `--health-check-url /path/to/health` to your `toolforge webservice start` command, and toolforge will automatically restart your webservice if it stops responding to that path (you can change the path to whatever you want, ex. `/`).
By "if it stops responding" you mean stops responding with HTTP status 200? Or stops responding with non-zero bytes of content? Or...? :)
Cheers, Maciej Nux
On 02/29 14:30, Maciej Jaros wrote:
David Caro (2024-02-29 13:57):
[...]
The most interesting feature though is being able to pass a url to use as HTTP health check.
To do so you can pass `--health-check-url /path/to/health` to your `toolforge webservice start` command, and toolforge will automatically restart your webservice if it stops responding to that path (you can change the path to whatever you want, ex. `/`).
By "if it stops responding" you mean stops responding with HTTP status 200? Or stops responding with non-zero bytes of content? Or...? :)
In the case of TCP probes (default if you don't specify anything), that'd be if the application stops responding at all (unable to connect, tcp timeout), so only really fatal failures are caught.
For HTTP it will be if the service stops responding at all (unable to connect, timeout, ...) or responds with anything different than a 200 HTTP code (the content does not matter).
Cheers, Maciej Nux _______________________________________________ Cloud mailing list -- cloud@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
I'm the worst xd
The parameter is actually `--health-check-path <path-for-the-check>`.
Sorry for the confusion.
On 02/29 13:57, David Caro wrote:
Hi all!
Good news, we have enabled health checks for all the webservices running on toolforge.
There's no action required on your part, the next time you restart or stop/start your webservice, it will have a tcp health check by default (just making sure something is listening).
The most interesting feature though is being able to pass a url to use as HTTP health check.
To do so you can pass `--health-check-url /path/to/health` to your `toolforge webservice start` command, and toolforge will automatically restart your webservice if it stops responding to that path (you can change the path to whatever you want, ex. `/`).
Note that this url will be queried quite often, so try to avoid hitting a page that uses many resources.
Also a reminder that you can find this and smaller user-facing updates about the Toolforge platform features here: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Changelog
Original task: https://phabricator.wikimedia.org/T341919
Cheers!
-- David Caro SRE - Cloud Services Wikimedia Foundation https://wikimediafoundation.org/ PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3
"Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."