On Sat, Aug 17, 2013 at 10:48 PM, bawolff bawolff+wn@gmail.com wrote:
yes ken, you are right, lets stick to the issues at hand: (1) by when you will finally decide to invest the 10 minutes and properly trace the gitblit application? you have the commands in the ticket: https://bugzilla.wikimedia.org/show_bug.cgi?id=51769
(2) by when you will adjust your operating guideline, so it is clear to faidon, ariel and others that 10 minutes tracing of an application and getting a holistic view is mandatory _before_ restoring the service, if it goes down for so often, and for days every time. the 10 minutes more can not be noticed if it is gone for more than a day.
What information are you hoping to get from a trace that isn't currently known?
if a web application dies or stops responding this can be (1) caused by too many requests for the hardware it runs on. which can be influenced from outside the app by robots.txt, cache, etc. and inside the app by links e.g using "nofollow". but it can be (2) influenced by the application itself. a java application uses more or less operating system resources depending on how it is written. one might find this out by just reading the code. having a trace helps a lot here. a trace may reveal locking problems in case of multi threading, string operations causing OS calls for every character, creating and garbage collecting objects, and 100s of others. it is not necessary to wait until it stalls again to get the trace. many things can be seen during normal operations as well.
so i hope to get (2). (1) was handled ok in my opinion.
rupert