Hello!
Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge. These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server. In theory there should be few other user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Reminder: This maintenance is starting in about an hour.
On 6/2/20 8:01 AM, Andrew Bogott wrote:
Hello!
Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge. These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server. In theory there should be few other user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to
prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will
report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since
Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage
which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
This is done. Other than Horizon being disabled there was no service interruption during the upgrade.
-Andrew + the WMCS team
On 6/9/20 7:52 AM, Andrew Bogott wrote:
Reminder: This maintenance is starting in about an hour.
On 6/2/20 8:01 AM, Andrew Bogott wrote:
Hello!
Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge. These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server. In theory there should be few other user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to
prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will
report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since
Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage
which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hoi, I was happy that Reasonator was working again.. It is dead again. Thanks, GerardM
On Tue, 9 Jun 2020 at 19:47, Andrew Bogott abogott@wikimedia.org wrote:
This is done. Other than Horizon being disabled there was no service interruption during the upgrade.
-Andrew + the WMCS team
On 6/9/20 7:52 AM, Andrew Bogott wrote:
Reminder: This maintenance is starting in about an hour.
On 6/2/20 8:01 AM, Andrew Bogott wrote:
Hello!
Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge. These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server. In theory there should be few other user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to
prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will
report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since
Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage
which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780
- Andrew + the WMCS team
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Hoi, For the record, Reasonator is back. Thanks, GerardM
On Wed, 10 Jun 2020 at 08:42, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I was happy that Reasonator was working again.. It is dead again. Thanks, GerardM
On Tue, 9 Jun 2020 at 19:47, Andrew Bogott abogott@wikimedia.org wrote:
This is done. Other than Horizon being disabled there was no service interruption during the upgrade.
-Andrew + the WMCS team
On 6/9/20 7:52 AM, Andrew Bogott wrote:
Reminder: This maintenance is starting in about an hour.
On 6/2/20 8:01 AM, Andrew Bogott wrote:
Hello!
Next week we'll be rebuilding and upgrading the hardware that provides DNS service to cloud-vps and toolforge. These rebuilds will start at 14:00 UTC and the whole process may take 2-3 hours. It's likely that DNS lookups will be somewhat slower as clients fail over between the in-progress and the working server. In theory there should be few other user-facing effects from these upgrades.
In practice, though, this isn't something that we've done for quite a while, and touching DNS is always risky since it underlies pretty much everything. Here are some things to be ready for:
- As a precaution we'll be disabling Horizon during the window to
prevent new VMs or DNS changes landing in an inconsistent state.
- Some badly-behaved DNS clients won't fail over properly and will
report errors when their primary DNS server is down.
- Puppet will almost certainly experience transient failures, since
Puppet is known to be one of those badly-behaved clients.
- If things go very badly there may be periods of total DNS outage
which will result in many WMCS-hosted services failing. There's no particular reason that this /should/ happen, but this is the worst-case scenario.
For additional context, the phabricator task for this work is https://phabricator.wikimedia.org/T253780
- Andrew + the WMCS team
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud