Re: [Cloud] [Cloud-announce] [Toolforge] 2020 Kubernetes cluster automatic migration phase beginning

24 Feb 2020


      On 2/21/20 5:14 PM, Arthur Smith wrote:
...
One question - I seem to be getting some more timeout-related 500 server errors.
Was there a change in how that is handled somehow (i.e. reduced time limit for
response from the server)? I realize it's good practice to respond quickly, just
some of the existing cases don't at the moment and I'm hitting them occasionally.
There are at least 3 proxies involved in serving Toolforge webservices requests:
1) tool main front proxy (dynamicproxy) (http)
2) kubernetes front haproxy (tcp)
3) kubernetes nginx-ingress (http) and perhaps kube-proxy (tcp)
More information here:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Networking_and_in...
This is to say, yes, serving your request as soon as possible should help the
different proxy connections to don't die and work smoothly.
As of this email, we don't have any particular metrics or insights on proxies
performances and this is something we could explore in the near future (create a
specific grafana dashboard or something).
regards.
-- 
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud] [Cloud-announce] [Toolforge] 2020 Kubernetes cluster automatic migration phase beginning