Hello,
Gerrit changes are processed by a daemon known as Zuul which triggers
Jenkins jobs and report back to Gerrit as the infamous jenkins-bot.
Since I upgraded Zuul in May 2013, the jobs were not processed as fast
as we would have wanted. There were issues on Jenkins side, faulty
configuration on the server but we eventually reached some acceptable state.
Still. During busy hours (Europe evening, SF morning) with the l10n bot
submitting half a thousand of changes, we would have to wait up to half
an hour to get a test report. I spent most of the night debugging and
reproducing that issue on my computer then went to bed.
A few minutes ago, I have deployed a change that would make Zuul process
changes way faster. The change is ridiculously small since it is just
about commenting two lines:
https://gerrit.wikimedia.org/r/102465
It basically prevents Zuul from updating builds descriptions pages in
Jenkins until all builds have been completed. Since updating a build
description takes roughly 500ms, that is saving a ton of waiting time.
Thanks a ton to at least Ori, Timo, Qchris, RobH, manybubles and
basically everyone around using the CI platform in one way or another.
I have monitored the upgrade for the last few minutes in production and
that works as expected. The rollback plan is:
Revert
https://gerrit.wikimedia.org/r/102465
Merge revert
on gallium:
- cd /usr/local/src/zuul
- git pull
- as root:
* http_proxy=. python setup.py install
* /etc/init.d/zuul restart
I don't think that it is needed though. Will monitor again tonight
during the real peak hour.
For reference the bug is
https://bugzilla.wikimedia.org/48025 which I
believe is now properly fixed.
Merry Christmas.
--
Antoine "hashar" Musso