Followup on this:
The WMCS team is pretty sure that all user-facing services have been restored. If you encounter any current unexpected breakage, please email me directly or use !help on IRC.
There's still a fair bit of less-urgent cleanup left to do. Puppet will remain disabled on most VMs until that's finished, which may take a day or two.
-Andrew + the WMCS team.
On 6/4/20 10:18 AM, Bryan Davis wrote:
At 2020-06-04T11:12 UTC a change was merged to the operations/puppet.git repository which resulted in data loss for Cloud VPS projects using a local Puppetmaster (role::puppetmaster::standalone). The specific data loss is removal of any local to the Puppetmaster instance commits overlaid on the upstream labs/private.git repository. These patches would have contained passwords, ssh keys, TLS certificates, and similar authentication information for Puppet managed configuration.
The majority of Cloud VPS projects are not affected by this configuration data loss. Several highly used and visible projects, including Toolforge (tools) and Beta Cluster (deployment-prep), have some impact. We have disabled Puppet across all Cloud VPS instances that were reachable by our central command and control service (cumin) and are currently evaluating impact and recovering data from /var/logs/puppet.log change logs where available.
More information will be collected at https://phabricator.wikimedia.org/T254491 and an incident report will also be prepared once the initial response is complete.
Bryan