[Labs-admin] All that shinken noise!
Andrew Bogott
abogott at wikimedia.org
Wed Jun 28 15:01:47 UTC 2017
On 6/27/17 10:46 PM, Andrew Bogott wrote:
> tl;dr: Tools has a new puppetmaster, tools-puppetmaster-01. I
> haven't migrated clush over to that box but plan to tomorrow.
Update: We can't really build new clush masters right now due to the
dead Mirantis openstack repo. For now, tools-puppetmaster-02 will
remain as the clush master.
I've created a task about this package repo issue... we could
potentially work around things by backporting to python 2 or (possibly)
rebuilding on Stretch.
https://phabricator.wikimedia.org/T169099
>
>
> -
>
>
> In the beginning there was
>
> https://gerrit.wikimedia.org/r/#/c/361675/
>
> Which was a no-op on the production puppetmasters but not a no-op on
> labs puppetmasters due to my over-pruning. So, I swiftly reverted with
>
> https://gerrit.wikimedia.org/r/#/c/361710/
>
> Which would've fixed things. Except, by that time, the apache config
> for all self-hosted puppetmasters was broken. And, most of those
> puppetmasters are their /own/ puppetmasters which meant they couldn't
> fix themselves... So, I went through the list of involved hosts (via
> watroles) and pasted in the missing bits in order to kickstart things
> and everything should be fine now.
>
> except... when I fixed the tools puppetmaster it started saying
>
> "Warning: Could not intern from text/plain: nested asn1 error"
>
> I do not know what that is, and Google doesn't know what it is
> either. Google provided a slight hint, though, as someone at puppet
> labs responded to a bug report about that message with no explanation
> but a terse 'this is fixed in the next version'.
>
> So... since labs-puppetmaster-02 was running a 3.7-series package and
> we're using 3.8 packages elsewhere, I did an 'apt-get install
> puppetmaster' to move to 3.8. Except, there was some latent
> unpuppetized pinning in the apt config for that box (a different, also
> long story) which meant that instead of upgrading to 3.8 it upgraded
> to 4.something and after that I was well and truly doomed. No amount
> of cert regenerating or rebooting or de- and re-installing puppet
> packages would make that 'could not intern' error go away or anything
> run cleanly.
>
> So, I built a new puppetmaster for tools, tools-puppet-master-01,
> copied the custom private patches on -02 over to -01, clobbered all
> the certs on all the existing tools instances and switched everything
> over to the new puppetmaster. The dance of getting certs cleaned and
> then initial puppet runs going before an old cron'd puppet run starts
> up and creates a new but broken cert turned out to be intricate and
> frustrating and took hours and resulted in the many, many shinken
> emails that you all received.
>
> This was good practice for me since I'm going to have to migrate the
> rest of labs to a new puppetmaster soon anyway... but I apologize for
> all the racket.
>
> -Andrew
>
>
More information about the Labs-admin
mailing list