Re: [Cloud-admin] DNS name for the new k8s haproxy

21 Oct 2019

On 10/18/19 9:25 PM, Bryan Davis wrote:
...
  Back to Arturo's question, I think I agree that if
the IP in use is
 from the "wan-transport-eqiad" pool (which is a great name for a
 network and a horrible name for a pool), then the FQDN used for that
 IP should be in the wmcloud.org zone (or another zone dedicated to
 public IPs) and not the wikimedia.cloud zone.

On a second thought, Brooke suggested we use a floating IP for haproxy in fron
of the API server + ingress.
But the floating IP itself doesn't eliminate the single point of failure. We
would need to implement what Jason suggested.

Moreover, I wonder if we care about this SPOF at all. We could use a
cold-standby approach and create another VM with the same setup and only change
them by means of DNS. This should be enough.

We have 3 options:
* DNS failover
* Floating IP failover
* VRRP or other HA mechanisms

None of these mechanisms prevent clients from having to re-establish TCP
connections in case of failover (because the TCP session information is not in
the now-active node). The most simple option is DNS failover, so I would stick
to that.

k8s.toolsbeta.eqiad1.wikimedia.cloud --> 176.16.x.10 (active VM)
                                     --> 172.16.x.20 (cold-standby VM)

In case of manual failover:

                                     --> 176.16.x.10 (cold-standby VM)
k8s.toolsbeta.eqiad1.wikimedia.cloud --> 172.16.x.20 (active VM)

Honestly I think should be enough for this setup.

The topic would be different if we wanted to allow connecting to the k8s API
directly from the internet, but I don't think that's the case. We should only
allow connecting to that FQDN from dynamicproxy, because SSL termination is there.

-- 
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Cloud-admin] DNS name for the new k8s haproxy