We need to execute maintenance on our primary NFS cluster today. This
should not be impacting to users, but in case something does not go as
planned it may be. We will keep the list posted on status as much as
possible. Apologies for the short notice. This is set to begin in 3 hours.
chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and
Much of the Cloud Services staff will be traveling and attending
meetings next week. There will always be someone available for
emergencies, but routine support requests may get handled more slowly
Things will be back to normal the following Monday, the 25th.
- Andrew + the Cloud Services team
Because the NFS for tools is getting very tight, I am going to clean up (truncate) log, err and out files that are greater than 100M.
On Monday (6/11/2018) I will be starting this cleanup process.
If you have concerns about this process, please let us know on #wikimedia-cloud .
Thank you for your understanding,
Wikimedia Cloud Services
As part of routine security maintenance, we'll be rebooting all VMs and
virtualization hosts next Wednesday starting at 14:00 UTC (7AM San
Toolforge users should be largely unaffected by this activity. Other
projects (including deployment-prep) will experience sporadic downtime,
a few minutes for each interruption.
The entire process will take several hours. If you need a to-the-minute
advance schedule for any particular reboot, please let me know and I'll
put your system at the start.
-Andrew + the cloud team
We deleted the prometheus user from LDAP and created it locally .
This may cause puppet failures, since there is a timeframe in which the
id/gid in /var/lib/prometheus is the old LDAP one.
We are running a massive, CloudVPS-wide deluser/adduser/chown operation
to fix this.