I put together an example and some notes at https://wikitech.wikimedia.org/wiki/User:Jhedden/notes/keepalived

Feel free to login to these instances and try things out. 

Regards,
Jason


On Fri, Oct 18, 2019 at 4:12 PM Jason Hedden <jhedden@wikimedia.org> wrote:
On Fri, Oct 18, 2019 at 2:25 PM Bryan Davis <bd808@wikimedia.org> wrote:

> * It seems reasonable that DNS records for *.wmcloud.org should all
> relate to publicly routable addresses. Does it also seem reasonable
> that *.wikimedia.cloud DNS records should all relate to non-publicly
> routable addresses? If an address from a publically routable IPv4
> space is being used only to provide a service IP for a service that is
> 100% internal to Cloud VPS (like the example of the HAProxy cluster
> fronting a Kubernetes cluster inside the tools project) is it
> acceptable to create an A record in the *.wikimedia.cloud zone for
> that address?

yeah, that all sounds good to me.

> * Should we have a floating IP pool of non-publicly routable IPv4
> addresses for use cases where the service that is being provided is
> only intended to be internal to a single project or the Cloud VPS
> tenant network? Routable IPv4 addresses are a limited commodity, and
> currently Cloud VPS has a very small number of them available.

> One of the reasons that we are spending time discussing these things
> is that we hope to decide on a set of standards and practices which
> will make it easier to reason about use and maintenance of Cloud VPS.
> Towards that end, I think I would propose these answers to the new
> questions:

> * The *.wikimedia.cloud zone should only contain A records pointing to
> non-publicly routable IPv4 records. My reasoning for this is that it
> makes it easier to quickly think about the general threat model for a
> FQDN in this zone. If the FQDN ends in wikimedia.cloud, then the IP
> associated with that FQDN is not publically routable.

> * We should create a pool of floating ips using non-publicly routable
> IPv4 addresses for the explicit purpose of being used as service IPs
> for HA/LB systems within the Cloud VPS internal network. These service
> IPs would then be given FQDNs in the *.wikimedia.cloud zone without
> breaking the rule that all FQDNs ending in wikimedia.cloud are not
> publically routable.

> · We may have to adjust a lot of security groups if the new floating
> pool of private IPS is outside of the 172.16.0.0/21 subnet. If we
> carve it out as a subset of that CIDR then I *think* it requires
> careful changes to the existing "cloud-instances2-b-eqiad" subnet to
> cordon off a /24 or /25 block into a new neutron subnet that would be
> the source of the new pool and not part of the subnet that
> nova/neutron use to give out fixed IPs to instances. Hopefully Arturo
> or Jason can reason about the work and impact of this better than I
> can.

I wouldn't create another floating IP pool or make any changes to `cloud-instances2-b-eqiad` `172.16.0.0/21`
Reconfiguring this as an external network (floating IP requirement) would bring a lot of complexity with host routing, NAT rules, subnets and security groups.

Instead of using floating IPs for non-publicly routed subnets, I'd pre-allocate an IP address from that subnet, configure the front-end load balancer host's neutron ports with allowed address pairs, and use VRRP + keepalived to manage which host has the active virtual IP (VIP).

I can put together a simple prototype of this if there's any interest going down that path.

Regards,
Jason