Context:
"Galera on cloudcontrol1004 going out of sync"
https://phabricator.wikimedia.org/T302146
Galera (the database backend for OpenStack) has been very unstable ever
since I upgraded the cluster to Bullseye. This is probably an issue with
a buggy version of mariadb/galera.
I'm trying an experiment: mariadb is currently stopped on
cloudcontrol1004, and puppet disabled so it won't get restarted. I want
to see if that change (a two-node cluster and/or removing the
suspected-cursed cloudcontrol1004 from the cluster) causes things to
stop breaking.
I've done my best to downtime alerts that will result from this, bug if
one leaks through please don't respond by enabling puppet on 1004 -- we
want to leave that db node switched off for now.
Other services on cloudcontrol1004 should continue to run normally.
Thanks!
-Andrew