Hi folks,
Right now our CI infrastructure (Zuul/Jenkins/Nodepool) are having a bad day and aren't able to spawn new instances to perform tests. The outage is ongoing and there isn't an ETA for restoration of service just yet.
In the meantime: please avoid force-merging (doing the Verified+2 check yourself) and skipping Jenkins unless you're dealing with an urgent production issue that must land today. Doing so makes Zuul get extra noisy which makes further diagnosis difficult.
Thanks for your patience!
-Chad & rest of RelEng
On 05/07/16 23:28, Chad Horohoe wrote:
Hi folks,
Right now our CI infrastructure (Zuul/Jenkins/Nodepool) are having a bad day and aren't able to spawn new instances to perform tests. The outage is ongoing and there isn't an ETA for restoration of service just yet.
In the meantime: please avoid force-merging (doing the Verified+2 check yourself) and skipping Jenkins unless you're dealing with an urgent production issue that must land today. Doing so makes Zuul get extra noisy which makes further diagnosis difficult.
Thanks for your patience!
-Chad & rest of RelEng
Hello,
The issue is resolved now and the backlog has been processed.
It started around 19:40 UTC when labs lost the ability to create instance. That fully recovered at 21:40 UTC and the backlog has been completely processed by 22:30UTC.
On 7/5/16 5:47 PM, Antoine Musso wrote:
On 05/07/16 23:28, Chad Horohoe wrote:
Hi folks,
Right now our CI infrastructure (Zuul/Jenkins/Nodepool) are having a bad day and aren't able to spawn new instances to perform tests. The outage is ongoing and there isn't an ETA for restoration of service just yet.
In the meantime: please avoid force-merging (doing the Verified+2 check yourself) and skipping Jenkins unless you're dealing with an urgent production issue that must land today. Doing so makes Zuul get extra noisy which makes further diagnosis difficult.
Thanks for your patience!
-Chad & rest of RelEng
Hello,
The issue is resolved now and the backlog has been processed.
It started around 19:40 UTC when labs lost the ability to create instance. That fully recovered at 21:40 UTC and the backlog has been completely processed by 22:30UTC.
The incident report for this outage is here: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160706-CI-Outag.... It was complicated!
-Andrew
wikitech-l@lists.wikimedia.org