Thanks for the update, Taavi! It's great to have a summary, and these
are all great-sounding projects. I will try to catch up on the
terraform patches today or tomorrow.
Two tangential points:
1) I can't emphasize enough how much we all appreciate the work you're
doing on our infra, and how much you're benefiting our projects. We're
always chatting about how much good work you're doing and bragging to
other teams about your contributions.
2) We're fairly careful to not actually direct your work, as I don't
want to overstep the boundary between staff and volunteer -- your #1
priority should always be to work on whatever you find most fun and
interesting at the moment. That said, if you /do/ want more coordination
(e.g. invites to in-person meetings, access to quarterly planning docs,
etc) you (or anyone else following along) should just say the word and
we'll figure out more ways to loop you in.
Thanks again!
-Andrew
On 10/9/22 6:19 AM, Taavi Väänänen wrote:
Hi cloud-admin@,
The recent cloud@ thread made me realize that I should probably keep
everyone else more up to date on the infrastructure level projects I'm
working on by myself. So I've tried to summarize the major recent and
upcoming changes I'm working on below in semi-random order.
Please let me know if you find this useful or interesting (or if you
don't, helps to know that too). Questions and comments on are also
welcome.
Terraform
I sent Puppet patches[0] to enable application credential
authentication in Keystone to let arbitrary clients speak to the
OpenStack APIs. I believe Andrew is working on the firewall rules and
related HAProxy config to open up the APIs to the public as a part of
the Cumin/Spicerack work going on at the moment.
I tagged version the initial version of the custom terraform-cloudvps
Terraform provider.[1] The provider is designed to supplement the
'official' OpenStack provider and currently lets you interact with the
web proxy API using the new go-cloudvps library[1], with Puppet ENC
support next up in my Terraform TODO list.
There's also a Puppet patch[3] pending to configure a self-hosted
Terraform registry on
terraform.wmcloud.org. It's cherry picked to the
project puppet master, but having at actually merged would be nice.
[0]:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/840121
[1]:
https://gitlab.wikimedia.org/repos/cloud/cloud-vps/terraform-cloudvps
[2]:
https://gitlab.wikimedia.org/repos/cloud/cloud-vps/go-cloudvps
[3]:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/834344
CloudVPS web proxy
Planning on doing some work to make the proxy service more reliable in
case of node failure. Also planned is moving the current SQLite
database to the cloudinfra MariaDB cluster for reliability / easier
failover purposes. There are a few Puppet patches prepping for this
pending review, starting from [4].
[4]:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/831041
Toolforge
Sent a few patches to the jobs-framework-* repositories. Planning to
do a bit more cleanup here, to hopefully make the grid migration easier.
I'd like to introduce a new k8s utility, kube-container-updater[5], to
automatically restart long-running containers that are running
outdated images.
Upgrading to Kubernetes 1.22 is only blocked on dealing with
certificate generation for the custom webhooks[6]. For this, I'd like
to get feedback on the approach (continue to manually sign
certificates or introduce cert-manager to automate that). Looking
further for the k8s versions, 1.23 will be fairly simple I think and
1.24 will require migrating the cluster from Docker to containerd
which I'd like to pair with a bullseye upgrade.
Once we have an object storage service I'd like to look a bit more
into providing a logging solution that doesn't use NFS.
[5]:
https://gerrit.wikimedia.org/r/c/cloud/toolforge/kube-container-updater/+/8…
[6]:
https://phabricator.wikimedia.org/T286856
metricsinfra
No recent development here. I think we could roll out Prometheus
scraping to all projects and instances with the current infra, but for
that someone would need to sort out how to deal with security groups
with the pull model Prometheus uses. Some discussion about this is in
Phabricator[7].
Second thing next up in the metricsinfra road map is building an API
to let projects manage their scraping rules and alerts. I'd like to
integrate that with Terraform at some point.
[7]:
https://phabricator.wikimedia.org/T288108
Puppet ENC service
Planning to do some work[8] on the ENC API service, mostly to make it
work with Terraform. Most notably the Git integration will be moved
from the Horizon dashboard to the API service itself.
[8]:
https://phabricator.wikimedia.org/T317478
ToolsDB
No recent developments here either. :(
_______________________________________________
Cloud-admin mailing list -- cloud-admin(a)lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/cloud-admin.lists.wikimedia.org/