[Labs-admin] All that shinken noise!

Andrew Bogott abogott at wikimedia.org
Wed Jun 28 15:01:47 UTC 2017


On 6/27/17 10:46 PM, Andrew Bogott wrote:
> tl;dr:  Tools has a new puppetmaster, tools-puppetmaster-01.  I 
> haven't migrated clush over to that box but plan to tomorrow.
Update:  We can't really build new clush masters right now due to the 
dead Mirantis openstack repo.  For now, tools-puppetmaster-02 will 
remain as the clush master.

I've created a task about this package repo issue... we could 
potentially work around things by backporting to python 2 or (possibly) 
rebuilding on Stretch.

https://phabricator.wikimedia.org/T169099



>
>
> -
>
>
> In the beginning there was
>
> https://gerrit.wikimedia.org/r/#/c/361675/
>
> Which was a no-op on the production puppetmasters but not a no-op on 
> labs puppetmasters due to my over-pruning.  So, I swiftly reverted with
>
> https://gerrit.wikimedia.org/r/#/c/361710/
>
> Which would've fixed things.  Except, by that time, the apache config 
> for all self-hosted puppetmasters was broken.  And, most of those 
> puppetmasters are their /own/ puppetmasters which meant they couldn't 
> fix themselves...  So, I went through the list of involved hosts (via 
> watroles) and pasted in the missing bits in order to kickstart things 
> and everything should be fine now.
>
> except... when I fixed the tools puppetmaster it started saying
>
>     "Warning: Could not intern from text/plain: nested asn1 error"
>
> I do not know what that is, and Google doesn't know what it is 
> either.  Google provided a slight hint, though, as someone at puppet 
> labs responded to a bug report about that message with no explanation 
> but a terse 'this is fixed in the next version'.
>
> So... since labs-puppetmaster-02 was running a 3.7-series package and 
> we're using 3.8 packages elsewhere, I did an 'apt-get install 
> puppetmaster' to move to 3.8.  Except, there was some latent 
> unpuppetized pinning in the apt config for that box (a different, also 
> long story) which meant that instead of upgrading to 3.8 it upgraded 
> to 4.something and after that I was well and truly doomed.  No amount 
> of cert regenerating or rebooting or de- and re-installing puppet 
> packages would make that 'could not intern' error go away or anything 
> run cleanly.
>
> So, I built a new puppetmaster for tools, tools-puppet-master-01, 
> copied the custom private patches on -02 over to -01, clobbered all 
> the certs on all the existing tools instances and switched everything 
> over to the new puppetmaster.  The dance of getting certs cleaned and 
> then initial puppet runs going before an old cron'd puppet run starts 
> up and creates a new but broken cert turned out to be intricate and 
> frustrating and took hours and resulted in the many, many shinken 
> emails that you all received.
>
> This was good practice for me since I'm going to have to migrate the 
> rest of labs to a new puppetmaster soon anyway... but I apologize for 
> all the racket.
>
> -Andrew
>
>




More information about the Labs-admin mailing list