Cloud-announce June 2018

cloud-announce@lists.wikimedia.org

4 participants
6 discussions

NFS maintenance for VPS and Toolforge
by Chase Pettet 28 Jun '18

28 Jun '18

We need to execute maintenance on our primary NFS cluster today. This should not be impacting to users, but in case something does not go as planned it may be. We will keep the list posted on status as much as possible. Apologies for the short notice. This is set to begin in 3 hours. -- Chase Pettet chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and IRC

1 0

limited staff availability next week (2018-06-16 through 2018-06-25)
by Andrew Bogott 14 Jun '18

14 Jun '18

Hello! Much of the Cloud Services staff will be traveling and attending meetings next week. There will always be someone available for emergencies, but routine support requests may get handled more slowly than usual. Things will be back to normal the following Monday, the 25th. - Andrew + the Cloud Services team

1 0

ToolForge NFS space conservation
by Brooke Storm 08 Jun '18

08 Jun '18

Because the NFS for tools is getting very tight, I am going to clean up (truncate) log, err and out files that are greater than 100M. On Monday (6/11/2018) I will be starting this cleanup process. If you have concerns about this process, please let us know on #wikimedia-cloud . Thank you for your understanding, Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm(a)wikimedia.org IRC: bstorm_

1 0

Rolling VM reboots next Wednesday, 2018-06-06, beginning at 14:00 UTC
by Andrew Bogott 06 Jun '18

06 Jun '18

As part of routine security maintenance, we'll be rebooting all VMs and virtualization hosts next Wednesday starting at 14:00 UTC (7AM San Francisco time). Toolforge users should be largely unaffected by this activity. Other projects (including deployment-prep) will experience sporadic downtime, a few minutes for each interruption. The entire process will take several hours. If you need a to-the-minute advance schedule for any particular reboot, please let me know and I'll put your system at the start. -Andrew + the cloud team

1 2

prometheus user issue
by Arturo Borrero Gonzalez 05 Jun '18

05 Jun '18

Hi! We deleted the prometheus user from LDAP and created it locally [0]. This may cause puppet failures, since there is a timeframe in which the id/gid in /var/lib/prometheus is the old LDAP one. We are running a massive, CloudVPS-wide deluser/adduser/chown operation to fix this. [0] https://phabricator.wikimedia.org/T196137

1 0

ToolsDB maintenance
by Brooke Storm 05 Jun '18

05 Jun '18

ToolsDB will be undergoing maintenance and updates, Tuesday, June 5th at 1500 UTC to 1600 UTC. Actual outage times should be fairly brief, but during this time the database will be taken offline and the system rebooted. Due to the expected brief nature of the outage and the fact that some tables are not replicated (see https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups… <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…>), we are not planning on failing over to the replica at this time. Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm(a)wikimedia.org IRC: bstorm_

1 1

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-announce June 2018