Hi,
Here is just a brief update on the status of Toolforge and CloudVPS by today 2019-02-16, along with some guess-estimations and what to expect in following days. Keeping track of all the events we had this week may be complex, because they were several of them, and heavily intermixed.
* CloudVPS suffered severe hardware issues this week [0]. We solved most of the problems and added spare hardware [1] because our server capacity was really lowered. This service should be mostly stable right now.
* Toolsdb (tools.db.svc.eqiad.wmflabs) is currently overloaded and suffering from hardware errors. We are already working on a replacement for this service [2]. Services depending on this database aren't working properly (like PAWS) and Toolforge tools that use it are also affected.
An honest estimation is that services (specially Toolsdb) we won't be fully recovered until at least next Tuesday (2019-02-26).
Our current plans involve replacing the Toolsdb hardware with virtual machines inside CloudVPS [3]. We are trying to be extra cautious to prevent data loss and other problems usually associated with doing things in a rush.
Finally, I would like to mention that we are all well aware of the importance of these services for the community and we are doing our best to get things fixed. Thanks for your understanding and patience.
regards
[0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190213-cloudvps [1] CloudVPS: drain and rebuild labvirt1009 as cloudvirt1009 https://phabricator.wikimedia.org/T216239 [2] ToolsDB overload and cleanup https://phabricator.wikimedia.org/T216208 [3] Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020 https://phabricator.wikimedia.org/T193264
cloud-admin@lists.wikimedia.org