On Fri, Oct 18, 2019 at 10:49 AM Arturo Borrero Gonzalez aborrero@wikimedia.org wrote:
On 10/18/19 2:19 PM, Arturo Borrero Gonzalez wrote:
Ok for the floating IP!
The domain name, I would use what Bryan suggests:
- k8s.tools.eqiad1.wikimedia.cloud
- k8s.tools.eqiad.wmflabs
And the counterparts:
- k8s.toolsbeta.eqiad1.wikimedia.cloud
- k8s.toolsbeta.eqiad.wmflabs
regards.
The final question. I may have found a contradiction.
If we are going to use a floating IP for this (we just agreed on that), then using 'tools.eqiad1.wikimedia.cloud' may be wrong, since that's meant to hold private IP addresses. Shall we step back and consider the 'wmcloud.org' domain?
I miss information on how to handle floating IPs wrt. the new domains. Just asking for the additional clarification.
I think is important we are having this debate (and clarifying the examples in the wiki), this will make things easier in the future.
Today the only floating IP pool we have in eqiad1 is the IPv4 publically routable address space labeled "wan-transport-eqiad" (185.15.56.0/25)[0]. An address drawn from this pool today would typically be associated with a *.wmflabs.org FQDN. One example: 185.15.56.18 is a floating IP associated with the mx-out01 instance in the cloudinfra project with an A record of mx-out01.wmflabs.org and PTR records of mx-out01.wmflabs.org and instance-mx-out01.cloudinfra.wmflabs.org.
The agreed replacement for *.wmflabs.org is *.wmcloud.org.
I believe that these are statements of fact. These statements of fact lead me to ask two new questions as we look to a future with more usage of HAProxy or some other LBaaS solution being applied more commonly inside Cloud VPS projects:
* It seems reasonable that DNS records for *.wmcloud.org should all relate to publicly routable addresses. Does it also seem reasonable that *.wikimedia.cloud DNS records should all relate to non-publicly routable addresses? If an address from a publically routable IPv4 space is being used only to provide a service IP for a service that is 100% internal to Cloud VPS (like the example of the HAProxy cluster fronting a Kubernetes cluster inside the tools project) is it acceptable to create an A record in the *.wikimedia.cloud zone for that address?
* Should we have a floating IP pool of non-publicly routable IPv4 addresses for use cases where the service that is being provided is only intended to be internal to a single project or the Cloud VPS tenant network? Routable IPv4 addresses are a limited commodity, and currently Cloud VPS has a very small number of them available.
One of the reasons that we are spending time discussing these things is that we hope to decide on a set of standards and practices which will make it easier to reason about use and maintenance of Cloud VPS. Towards that end, I think I would propose these answers to the new questions:
* The *.wikimedia.cloud zone should only contain A records pointing to non-publicly routable IPv4 records. My reasoning for this is that it makes it easier to quickly think about the general threat model for a FQDN in this zone. If the FQDN ends in wikimedia.cloud, then the IP associated with that FQDN is not publically routable.
* We should create a pool of floating ips using non-publicly routable IPv4 addresses for the explicit purpose of being used as service IPs for HA/LB systems within the Cloud VPS internal network. These service IPs would then be given FQDNs in the *.wikimedia.cloud zone without breaking the rule that all FQDNs ending in wikimedia.cloud are not publically routable.
We may have to adjust a lot of security groups if the new floating pool of private IPS is outside of the 172.16.0.0/21 subnet. If we carve it out as a subset of that CIDR then I *think* it requires careful changes to the existing "cloud-instances2-b-eqiad" subnet to cordon off a /24 or /25 block into a new neutron subnet that would be the source of the new pool and not part of the subnet that nova/neutron use to give out fixed IPs to instances. Hopefully Arturo or Jason can reason about the work and impact of this better than I can.
Back to Arturo's question, I think I agree that if the IP in use is from the "wan-transport-eqiad" pool (which is a great name for a network and a horrible name for a pool), then the FQDN used for that IP should be in the wmcloud.org zone (or another zone dedicated to public IPs) and not the wikimedia.cloud zone.
I am also starting to feel like this is all too complicated somehow. Maybe it is just that using valid, registered TLDs is somehow adding cognitive burden for me that the legacy 'fake' TLDs did not?
[0]: It seems we have a buggy version of python-openstackclient that makes digging this information out of the system more difficult than it should be. https://bugs.launchpad.net/python-openstackclient/+bug/1616129
Bryan