The sudden arrival of the wdqs cloudvirts (T221631) has provided a very straightforward use case for nova host aggregates. We'd already been planning to adopt them at some point (T226731) so I've gone ahead and set some up today.
Starting sometime soon (maybe tomorrow!) a host aggregate named 'standard' will replace the existing 'scheduler pool,' and the profile::openstack::eqiad1::nova::scheduler_pool: hiera key will vanish. That knowledge will instead live inside the nova database, and can be queried in a few ways, most simply with '# openstack aggregate show standard'
I've done my best to document all this[0] but want to call out a few points:
- We will no longer have git history explaining why a given cloudvirt is pooled or depooled. For that reason it is more important than ever to !log any change to aggregate membership. I propose we standardize on the !log admin SAL in -cloud rather than the production -operations SAL in for this.
- In order to reduce the chances of losing track of a hypervisor entirely, I've created some tracking aggregates. If you remove a cloudvirt from the 'standard' aggregate, please re-assign it to 'maintenance', 'spare', or 'toobusy' as appropriate.
That's it! If anyone really hates this please let me know and I can roll things back. I'm already regretting the name 'standard' but at least it's not badly overloaded like my first choice, 'public', is.
[0] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Host_aggregates
On Thu, Feb 27, 2020 at 5:55 PM Andrew Bogott abogott@wikimedia.org wrote:
The sudden arrival of the wdqs cloudvirts (T221631) has provided a very straightforward use case for nova host aggregates. We'd already been planning to adopt them at some point (T226731) so I've gone ahead and set some up today.
Starting sometime soon (maybe tomorrow!) a host aggregate named 'standard' will replace the existing 'scheduler pool,' and the profile::openstack::eqiad1::nova::scheduler_pool: hiera key will vanish. That knowledge will instead live inside the nova database, and can be queried in a few ways, most simply with '# openstack aggregate show standard'
I've done my best to document all this[0] but want to call out a few points:
- We will no longer have git history explaining why a given cloudvirt is
pooled or depooled. For that reason it is more important than ever to !log any change to aggregate membership. I propose we standardize on the !log admin SAL in -cloud rather than the production -operations SAL in for this.
+1. I have been thinking about adding a combined view of a few SAL channels to my sal tool to make it easier for us to view and search for log events, and this would fit nicely with that as well.
- In order to reduce the chances of losing track of a hypervisor
entirely, I've created some tracking aggregates. If you remove a cloudvirt from the 'standard' aggregate, please re-assign it to 'maintenance', 'spare', or 'toobusy' as appropriate.
This sounds like something that it would be a good idea to make a wrapper script to manage. Something like `wmcs-aggregate <assign|list> <standard|spare|maintanance|toobusy> [FQDN]` (example only, bikeshed on a phab task!) could make all this easier. We could also almost certainly figure out how to make that !log for us automagically as well.
That's it! If anyone really hates this please let me know and I can roll things back. I'm already regretting the name 'standard' but at least it's not badly overloaded like my first choice, 'public', is.
Naming is hard and in a case like this pretty arbitrary anyway. It's not like you are the jerk that named everything "cloud" around here. ;)
[0] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Host_aggregates
Bryan
On 2/28/20 4:34 AM, Bryan Davis wrote:
On Thu, Feb 27, 2020 at 5:55 PM Andrew Bogott abogott@wikimedia.org wrote:
The sudden arrival of the wdqs cloudvirts (T221631) has provided a very straightforward use case for nova host aggregates. We'd already been planning to adopt them at some point (T226731) so I've gone ahead and set some up today.
Starting sometime soon (maybe tomorrow!) a host aggregate named 'standard' will replace the existing 'scheduler pool,' and the profile::openstack::eqiad1::nova::scheduler_pool: hiera key will vanish. That knowledge will instead live inside the nova database, and can be queried in a few ways, most simply with '# openstack aggregate show standard'
I've done my best to document all this[0] but want to call out a few points:
<3 the docs. Thanks, really.
- We will no longer have git history explaining why a given cloudvirt is
pooled or depooled. For that reason it is more important than ever to !log any change to aggregate membership. I propose we standardize on the !log admin SAL in -cloud rather than the production -operations SAL in for this.
+1. I have been thinking about adding a combined view of a few SAL channels to my sal tool to make it easier for us to view and search for log events, and this would fit nicely with that as well.
LGTM. I already use our admin SAL for most of our stuff anyway.
- In order to reduce the chances of losing track of a hypervisor
entirely, I've created some tracking aggregates. If you remove a cloudvirt from the 'standard' aggregate, please re-assign it to 'maintenance', 'spare', or 'toobusy' as appropriate.
great idea!
This sounds like something that it would be a good idea to make a wrapper script to manage. Something like `wmcs-aggregate <assign|list> <standard|spare|maintanance|toobusy> [FQDN]` (example only, bikeshed on a phab task!) could make all this easier. We could also almost certainly figure out how to make that !log for us automagically as well.
+1 !!!
Awesome! Makes sense.
Brooke Storm SRE Wikimedia Cloud Services bstorm@wikimedia.org IRC: bstorm_
On 2/27/20 5:55 PM, Andrew Bogott wrote:
The sudden arrival of the wdqs cloudvirts (T221631) has provided a very straightforward use case for nova host aggregates. We'd already been planning to adopt them at some point (T226731) so I've gone ahead and set some up today.
Starting sometime soon (maybe tomorrow!) a host aggregate named 'standard' will replace the existing 'scheduler pool,' and the profile::openstack::eqiad1::nova::scheduler_pool: hiera key will vanish. That knowledge will instead live inside the nova database, and can be queried in a few ways, most simply with '# openstack aggregate show standard'
I've done my best to document all this[0] but want to call out a few points:
- We will no longer have git history explaining why a given cloudvirt
is pooled or depooled. For that reason it is more important than ever to !log any change to aggregate membership. I propose we standardize on the !log admin SAL in -cloud rather than the production -operations SAL in for this.
- In order to reduce the chances of losing track of a hypervisor
entirely, I've created some tracking aggregates. If you remove a cloudvirt from the 'standard' aggregate, please re-assign it to 'maintenance', 'spare', or 'toobusy' as appropriate.
That's it! If anyone really hates this please let me know and I can roll things back. I'm already regretting the name 'standard' but at least it's not badly overloaded like my first choice, 'public', is.
[0] https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Host_aggregates
Cloud-admin mailing list Cloud-admin@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/cloud-admin
cloud-admin@lists.wikimedia.org