yes ken, you are right, lets stick to the issues at
hand:
(1) by when you will finally decide to invest the 10 minutes and
properly trace the gitblit application? you have the commands in the
ticket:
https://bugzilla.wikimedia.org/show_bug.cgi?id=51769
(2) by when you will adjust your operating guideline, so it is clear
to faidon, ariel and others that 10 minutes tracing of an application
and getting a holistic view is mandatory _before_ restoring the
service, if it goes down for so often, and for days every time. the 10
minutes more can not be noticed if it is gone for more than a day.
What information are you hoping to get from a trace that isn't currently known?
I'm not involved with the issue, and don't know specifics, but reading
the bug it sounds like there isn't any information we need from a
trace at the moment. (It sounds like there was a point in the
debugging stage where that would be useful, but that's in the past)
If you want people to trace something, you should justify why it will
help you fix the issue/discover what's going on/etc. I would consider
it inappropriate for ops to wait 10 minutes before restarting
something in order to get a stack trace, if we didn't need the info
in the trace.
--bawolff