Le 10/05/13 00:00, Antoine Musso a écrit :
Jenkins crashed again today. The first time at 6am UTC, I got it fixed.
And again between 9pm and 10pm UTC.
This has been a recurring event since we have upgraded our installation
and the bug is:
Tonight I got Jenkins access log enabled and made Zuul query jenkins
directly instead of passing via SSL + an Apache frontend proxy. That
will help a little bit.
The root cause is some weird issue in Jenkins where one of its thread
will use 100% CPU. I have yet to determine what that thread is doing
though nor what trigger the exact issue. Whenever I get some useful
informations I will fill a bug upstream and make sure it get attention.
So I went to bed, and in the morning Jenkins was unsurprisingly stuck
again. Enjoying coffee and croissant, my morning newspapers have been
replaced by obscure web browsers windows titled:
"how to read a java heap dump"
"help reading a 2GB head dump" (trivia: you need a ton of memory)
"java stack trace"
"google: enable java debugging symbols"
"Garbage Collection in the Java HotSpot Virtual Machine" 
All of that while breaking the #1 WMF rule: "do not work in pyjama".
I found out the Java Heap memory was full.
Also took time to look at a Jenkins notice that is warning about some
mysterious old data format. After some reading, they are XML elements
from the history build files which points to non existent entry points
in Jenkins. That can happens when a plugin is removed.
When Jenkins parse the build history, it will record an in memory entry
for each occurrences, with the thousands of builds we keep, that turns
in a memory killer.
Jenkins offer the possibility to clean the, now invalid, elements for us
but it is eventually terribly slow. I thus resurrected my sed skills
and altered the XML file. That ran from 12:25am UTC till 17:19am UTC.
The invalid data gone, I hope Jenkins is not going to fill its memory
again :-] I will monitor that tonight and on Monday then probably call
I am really sorry for the multiple inconveniences since the upgrade on
May 2nd and for the long time it took me to figure out the issue :(
Thanks Chad for the helpful tips regarding Java Heap memory size and
thank you Timo for the Java Melody monitoring system.
The bug report:
Antoine "hashar" Musso