[Labs-admin] All that shinken noise!

Andrew Bogott abogott at wikimedia.org
Wed Jun 28 03:46:29 UTC 2017


tl;dr:  Tools has a new puppetmaster, tools-puppetmaster-01.  I haven't 
migrated clush over to that box but plan to tomorrow.


-


In the beginning there was

https://gerrit.wikimedia.org/r/#/c/361675/

Which was a no-op on the production puppetmasters but not a no-op on 
labs puppetmasters due to my over-pruning.  So, I swiftly reverted with

https://gerrit.wikimedia.org/r/#/c/361710/

Which would've fixed things.  Except, by that time, the apache config 
for all self-hosted puppetmasters was broken.  And, most of those 
puppetmasters are their /own/ puppetmasters which meant they couldn't 
fix themselves...  So, I went through the list of involved hosts (via 
watroles) and pasted in the missing bits in order to kickstart things 
and everything should be fine now.

except... when I fixed the tools puppetmaster it started saying

     "Warning: Could not intern from text/plain: nested asn1 error"

I do not know what that is, and Google doesn't know what it is either.  
Google provided a slight hint, though, as someone at puppet labs 
responded to a bug report about that message with no explanation but a 
terse 'this is fixed in the next version'.

So... since labs-puppetmaster-02 was running a 3.7-series package and 
we're using 3.8 packages elsewhere, I did an 'apt-get install 
puppetmaster' to move to 3.8.  Except, there was some latent 
unpuppetized pinning in the apt config for that box (a different, also 
long story) which meant that instead of upgrading to 3.8 it upgraded to 
4.something and after that I was well and truly doomed.  No amount of 
cert regenerating or rebooting or de- and re-installing puppet packages 
would make that 'could not intern' error go away or anything run cleanly.

So, I built a new puppetmaster for tools, tools-puppet-master-01, copied 
the custom private patches on -02 over to -01, clobbered all the certs 
on all the existing tools instances and switched everything over to the 
new puppetmaster.  The dance of getting certs cleaned and then initial 
puppet runs going before an old cron'd puppet run starts up and creates 
a new but broken cert turned out to be intricate and frustrating and 
took hours and resulted in the many, many shinken emails that you all 
received.

This was good practice for me since I'm going to have to migrate the 
rest of labs to a new puppetmaster soon anyway... but I apologize for 
all the racket.

-Andrew





More information about the Labs-admin mailing list