[Labs-admin] All that shinken noise!
Andrew Bogott
abogott at wikimedia.org
Wed Jun 28 03:46:29 UTC 2017
tl;dr: Tools has a new puppetmaster, tools-puppetmaster-01. I haven't
migrated clush over to that box but plan to tomorrow.
-
In the beginning there was
https://gerrit.wikimedia.org/r/#/c/361675/
Which was a no-op on the production puppetmasters but not a no-op on
labs puppetmasters due to my over-pruning. So, I swiftly reverted with
https://gerrit.wikimedia.org/r/#/c/361710/
Which would've fixed things. Except, by that time, the apache config
for all self-hosted puppetmasters was broken. And, most of those
puppetmasters are their /own/ puppetmasters which meant they couldn't
fix themselves... So, I went through the list of involved hosts (via
watroles) and pasted in the missing bits in order to kickstart things
and everything should be fine now.
except... when I fixed the tools puppetmaster it started saying
"Warning: Could not intern from text/plain: nested asn1 error"
I do not know what that is, and Google doesn't know what it is either.
Google provided a slight hint, though, as someone at puppet labs
responded to a bug report about that message with no explanation but a
terse 'this is fixed in the next version'.
So... since labs-puppetmaster-02 was running a 3.7-series package and
we're using 3.8 packages elsewhere, I did an 'apt-get install
puppetmaster' to move to 3.8. Except, there was some latent
unpuppetized pinning in the apt config for that box (a different, also
long story) which meant that instead of upgrading to 3.8 it upgraded to
4.something and after that I was well and truly doomed. No amount of
cert regenerating or rebooting or de- and re-installing puppet packages
would make that 'could not intern' error go away or anything run cleanly.
So, I built a new puppetmaster for tools, tools-puppet-master-01, copied
the custom private patches on -02 over to -01, clobbered all the certs
on all the existing tools instances and switched everything over to the
new puppetmaster. The dance of getting certs cleaned and then initial
puppet runs going before an old cron'd puppet run starts up and creates
a new but broken cert turned out to be intricate and frustrating and
took hours and resulted in the many, many shinken emails that you all
received.
This was good practice for me since I'm going to have to migrate the
rest of labs to a new puppetmaster soon anyway... but I apologize for
all the racket.
-Andrew
More information about the Labs-admin
mailing list