On Sat, Aug 17, 2013 at 10:19:10PM +0200, rupert THURNER wrote:
(2) by when you will adjust your operating guideline, so it is clear to faidon, ariel and others that 10 minutes tracing of an application and getting a holistic view is mandatory _before_ restoring the service, if it goes down for so often, and for days every time. the 10 minutes more can not be noticed if it is gone for more than a day.
I think you're making several incorrect assumptions and mistakes here.
First, is, respectfully, a wrong approach on how we should react to emergencies. git.wm.org as a service has owners and I'm not one of them. My reaction was a reaction for a service I know next to nothing about, on a service outage, on a Sunday. The service was down and the first priority is to restore it and making sure it won't die again when I'm not looking at my screen. An overload caused by Googlebot was deemed to be causing this and I decided that Google indexing could be temporarily suspended until the situation was to be properly assesed, by the right people, with a clear head. I still think it was an fair compromise to make.
Second, you're assuming that the reason of the outage was a software bug and that a stacktrace was going to be helpful. I haven't followed up, but I don't think this is correct here -- and even if it was, it's wrong to assume it will always be, that's what post-mortems are for. Preliminary investigation showed the website being crawled for every single patch by every single author, plus stats going back random amounts of time, for all Wikimedia git projects. All software will break under the right amount of load and there's nothing stacktraces can help with here. (rel=nofollow would help, but a stacktrace wouldn't tell you this).
Third, you're assuming that others are not working with gitblit upstream. Chad is already collaborating with gitblit upstream and has done so before this happened. He's already engaging with them on other issues potentially relevant to this outage[1]. He also has an excellent track record with (at least) his previous collaboration with Gerrit upstream. Finally, Ori contributed a patch that was merged for one of the root causes of gitblit outages[2].
Fourth, you're extrapolating my personal "pushing upstream" attitude from a single incident and single interaction with me and from there extrapolating the team's and foundation's attitude and finally reaching to the conclusion that we won't collaborate with upstreams for HTTPS. These are all incorrect extrapolations and wild guesses.
I can tell you for a fact that you couldn't be farther from truth. Both I and others work closely with a large number of upstreams regularly. I've worked with all kinds of upstreams for years, long before I joined the foundation and I'm not planning to stop anytime soon -- it's also part of my job description and of the organizational mandate, so it's much more than my personal modus operandi.
Finally, specifically for the cases you mention: for HTTPS, ironically, I sent a couple of patches to httpd-dev last week for bugs that I found while playing with Apache 2.4 for Wikimedia's SSL cluster. One of them came after discussions with Ryan on SSL sessions security and potential attacks[3]. As for DNS, I worked closely with upstream[4], Brandon, for gdnsd when I was still evaluating it for use by Wikimedia (and I even maintain it in Debian nowadays[5]); Brandon was hired by the foundation a few months later -- it's hard to get better relations with upstream than having them on the team, don't you think?
I hope these address your concerns. If not, I'd be happy to provide more information and take feedback, but please, let's keep it civil, let's keep it technical and let's keep it on-topic and in perspective -- the issue at hand (security & protection from state surveillance) is far too important for us to be discussing the response to a minor gitblit outage in the same thread, IMHO.
Best, Faidon
1: https://code.google.com/p/gitblit/issues/detail?id=274 2: https://github.com/gitblit/gitblit/commit/da71511be5a4d205e571b548d6330ed7f7... 3: http://mail-archives.apache.org/mod_mbox/httpd-dev/201308.mbox/%3C2013080511... 4: https://github.com/blblack/gdnsd/issues/created_by/paravoid?state=closed 5: http://packages.debian.org/sid/gdnsd