Next week we'll be rebuilding and upgrading the hardware that provides
DNS service to cloud-vps and toolforge. These rebuilds will start at
14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS
lookups will be somewhat slower as clients fail over between the
in-progress and the working server. In theory there should be few other
user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a
while, and touching DNS is always risky since it underlies pretty much
everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to
prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will
report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since
Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage which
will result in many WMCS-hosted services failing. There's no particular
reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is
- Andrew + the WMCS team
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)