We'll be upgrading the cloud services OpenStack install on Monday, beginning at 14:00 UTC.
The entire upgrade process may take a couple of hours. Early on in the process, Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30 minutes.) There may also be brief network interruptions during the upgrade, although if all goes well these will not be noticeable by users.
Toolforge and existing VMs should be largely unaffected apart from possible network hiccups.
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Reminder: this is happening in about 30 minutes.
On 10/1/19 1:24 PM, Andrew Bogott wrote:
We'll be upgrading the cloud services OpenStack install on Monday, beginning at 14:00 UTC.
The entire upgrade process may take a couple of hours. Early on in the process, Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30 minutes.) There may also be brief network interruptions during the upgrade, although if all goes well these will not be noticeable by users.
Toolforge and existing VMs should be largely unaffected apart from possible network hiccups.
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
This took a lot longer than expected, but everything is now upgraded. There was a brief dns + network outage throughout the cloud which also affected toolforge. We believe that all of those issues are resolved now, but please let us know if you see any bad long-term effects.
On 10/1/19 1:24 PM, Andrew Bogott wrote:
We'll be upgrading the cloud services OpenStack install on Monday, beginning at 14:00 UTC.
The entire upgrade process may take a couple of hours. Early on in the process, Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30 minutes.) There may also be brief network interruptions during the upgrade, although if all goes well these will not be noticeable by users.
Toolforge and existing VMs should be largely unaffected apart from possible network hiccups.
- Andrew + the WMCS team
_______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Andrew,
I got 30+ emails and counting for stale file handles, permission errors, email bounces, etc. Not sure what definition of brief you're using for the outage (in my line of work that's sometimes less than a second), but all these emails seem to have been send in a two hour window based on the timestamps. Start 17:42 end 19:55 (both Amsterdam time). Maybe you can stop cron next time you're going to planned break toollabs? Or was this caused by the dns + network outage?
Maarten
On 07-10-19 20:55, Andrew Bogott wrote:
This took a lot longer than expected, but everything is now upgraded. There was a brief dns + network outage throughout the cloud which also affected toolforge. We believe that all of those issues are resolved now, but please let us know if you see any bad long-term effects.
On 10/1/19 1:24 PM, Andrew Bogott wrote:
We'll be upgrading the cloud services OpenStack install on Monday, beginning at 14:00 UTC.
The entire upgrade process may take a couple of hours. Early on in the process, Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30 minutes.) There may also be brief network interruptions during the upgrade, although if all goes well these will not be noticeable by users.
Toolforge and existing VMs should be largely unaffected apart from possible network hiccups.
- Andrew + the WMCS team
Wikimedia Cloud Services announce mailing list Cloud-announce@lists.wikimedia.org (formerly labs-announce@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce _______________________________________________ Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Mon, Oct 7, 2019 at 1:44 PM Maarten Dammers maarten@mdammers.nl wrote:
Andrew,
I got 30+ emails and counting for stale file handles, permission errors, email bounces, etc. Not sure what definition of brief you're using for the outage (in my line of work that's sometimes less than a second), but all these emails seem to have been send in a two hour window based on the timestamps. Start 17:42 end 19:55 (both Amsterdam time). Maybe you can stop cron next time you're going to planned break toollabs? Or was this caused by the dns + network outage?
The errors in Toolforge were unintended side effects of the OpenStack upgrade. The network issues cascaded to cause a variety of issues across Cloud VPS projects including Toolforge. We will be working on an incident report and it will be shared with this list when it has been prepared.
Those of us on the "root@" emails got around 400+ emails triggered by various parts of the service interruption, so we have empathy for the inbox problem this caused for others. :/
Bryan