Hello,
Gerrit changes are processed by a daemon known as Zuul which triggers Jenkins jobs and report back to Gerrit as the infamous jenkins-bot.
Since I upgraded Zuul in May 2013, the jobs were not processed as fast as we would have wanted. There were issues on Jenkins side, faulty configuration on the server but we eventually reached some acceptable state.
Still. During busy hours (Europe evening, SF morning) with the l10n bot submitting half a thousand of changes, we would have to wait up to half an hour to get a test report. I spent most of the night debugging and reproducing that issue on my computer then went to bed.
A few minutes ago, I have deployed a change that would make Zuul process changes way faster. The change is ridiculously small since it is just about commenting two lines:
https://gerrit.wikimedia.org/r/102465
It basically prevents Zuul from updating builds descriptions pages in Jenkins until all builds have been completed. Since updating a build description takes roughly 500ms, that is saving a ton of waiting time.
Thanks a ton to at least Ori, Timo, Qchris, RobH, manybubles and basically everyone around using the CI platform in one way or another.
I have monitored the upgrade for the last few minutes in production and that works as expected. The rollback plan is:
Revert https://gerrit.wikimedia.org/r/102465 Merge revert on gallium: - cd /usr/local/src/zuul - git pull - as root: * http_proxy=. python setup.py install * /etc/init.d/zuul restart
I don't think that it is needed though. Will monitor again tonight during the real peak hour.
For reference the bug is https://bugzilla.wikimedia.org/48025 which I believe is now properly fixed.
Merry Christmas.
Thank you very much, Antoine. בתאריך 18 בדצמ 2013 19:43, מאת "Antoine Musso" hashar+wmf@free.fr:
Hello,
Gerrit changes are processed by a daemon known as Zuul which triggers Jenkins jobs and report back to Gerrit as the infamous jenkins-bot.
Since I upgraded Zuul in May 2013, the jobs were not processed as fast as we would have wanted. There were issues on Jenkins side, faulty configuration on the server but we eventually reached some acceptable state.
Still. During busy hours (Europe evening, SF morning) with the l10n bot submitting half a thousand of changes, we would have to wait up to half an hour to get a test report. I spent most of the night debugging and reproducing that issue on my computer then went to bed.
A few minutes ago, I have deployed a change that would make Zuul process changes way faster. The change is ridiculously small since it is just about commenting two lines:
https://gerrit.wikimedia.org/r/102465
It basically prevents Zuul from updating builds descriptions pages in Jenkins until all builds have been completed. Since updating a build description takes roughly 500ms, that is saving a ton of waiting time.
Thanks a ton to at least Ori, Timo, Qchris, RobH, manybubles and basically everyone around using the CI platform in one way or another.
I have monitored the upgrade for the last few minutes in production and that works as expected. The rollback plan is:
Revert https://gerrit.wikimedia.org/r/102465 Merge revert on gallium:
- cd /usr/local/src/zuul
- git pull
- as root:
- http_proxy=. python setup.py install
- /etc/init.d/zuul restart
I don't think that it is needed though. Will monitor again tonight during the real peak hour.
For reference the bug is https://bugzilla.wikimedia.org/48025 which I believe is now properly fixed.
Merry Christmas.
-- Antoine "hashar" Musso
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org