[Labs-l] Volunteers wanted to opt in to a new DNS system
Antoine Musso
hashar+wmf at free.fr
Tue Apr 7 14:20:36 UTC 2015
On 03/04/15 19:38, Andrew Bogott wrote:
<snip>
> Additionally, I would appreciate it if a few projects would volunteer to
> be early adopters. If you're interested in trying it out, please
> respond to this email so that I know who's trying, and then go to your
> 'configure instance' pages and clear the 'use_dnsmasq' setting. If your
> instance is using role::puppet::self, you'll also need to sign a new
> puppet cert, like this:
>
> $ sudo puppet cert sign <hostname>.<projectname>.eqiad.wmflabs
>
> In addition to being more reliable, the new DNS system will also support
> names that include the project name, like
> 'util-abogott.testlabs.eqiad.wmflabs'. The old naming scheme is still
> supported, but many services will be gradually moving over to the new
> scheme to avoid ambiguity between projects.
>
> After a few weeks of testing I'll start to migrate everything to the new
> server if things look good. Let me know how things go.
>
> Thanks!
>
> -Andrew
>
>
> [1] The new system uses openstack-designate to create dns entries which
> are subsequently served by a powerdns server running on
> labs-ns2.wikimedia.org
Hello,
The 'integration' labs project has been switched to that new DNS by
mistake which caused a partial outage on CI.
The use_dnsmasq (which is set to true on instances) has been renamed to
'use_dnsmasq_server' when support for hiera has been added with:
https://gerrit.wikimedia.org/r/#/c/202278/
That immediately caused puppet client on the integration run to switch
to the new DNS resolver which caused two major issues:
A) all puppet client suddenly refused connection due to the certname
being based on the hostname instead of the ec2id
B) Jenkins jobs hitting the beta cluster all failed because the
resolution of *.beta.wmflabs.org DNS entries yields the public instance
IP which is not reacheable.
I have filled https://phabricator.wikimedia.org/T95273 which contains
the work I did to revert back to the previous state. Namely:
* Have hiera set both use_dnsmasq and use_dnsmasq_server
https://wikitech.wikimedia.org/w/index.php?title=Hiera:Integration&diff=152484&oldid=152033
* Manually reinstall / fix configuration on the puppetmaster since files
have gone wild.
* Fight with cert regeneration on puppetmaster
* Manually fix /etc/resolv.conf on all instances and restart ncsd to
clear the DNS cache
* Delete and invalidate certs of all clients and resign them.
The beta cluster which uses a puppetmaster has not been affected
luckily. No idea why though.
I have filled https://phabricator.wikimedia.org/T95288 to have Designate
yield different answer based on the client (a feature known as split
horizon).
--
Antoine Musso
More information about the Labs-l
mailing list