Hello,
Jenkins crashed again today. The first time at 6am UTC, I got it fixed. And again between 9pm and 10pm UTC.
This has been a recurring event since we have upgraded our installation and the bug is: https://bugzilla.wikimedia.org/show_bug.cgi?id=48025
Tonight I got Jenkins access log enabled and made Zuul query jenkins directly instead of passing via SSL + an Apache frontend proxy. That will help a little bit.
The root cause is some weird issue in Jenkins where one of its thread will use 100% CPU. I have yet to determine what that thread is doing though nor what trigger the exact issue. Whenever I get some useful informations I will fill a bug upstream and make sure it get attention.
Next secret plan: get rid of Jenkins..