Context:
"Galera on cloudcontrol1004 going out of sync" https://phabricator.wikimedia.org/T302146
Galera (the database backend for OpenStack) has been very unstable ever since I upgraded the cluster to Bullseye. This is probably an issue with a buggy version of mariadb/galera.
I'm trying an experiment: mariadb is currently stopped on cloudcontrol1004, and puppet disabled so it won't get restarted. I want to see if that change (a two-node cluster and/or removing the suspected-cursed cloudcontrol1004 from the cluster) causes things to stop breaking.
I've done my best to downtime alerts that will result from this, bug if one leaks through please don't respond by enabling puppet on 1004 -- we want to leave that db node switched off for now.
Other services on cloudcontrol1004 should continue to run normally.
Thanks!
-Andrew
cloud-admin@lists.wikimedia.org