On Fri, Mar 7, 2014 at 2:54 PM, Tyler Romeo tylerromeo@gmail.com wrote:
On Fri, Mar 7, 2014 at 5:39 PM, George Herbert <george.herbert@gmail.com
wrote:
With all due respect; hell, yes, development comes in second to
operational
stability.
This is not disrespecting development, which is extremely important by
any
measure. But we're running a top-10 worldwide website, a key worldwide information resource for humanity as a whole. We cannot cripple development to try and maximize stability, but stability has to be
priority
- Any large website's teams will have the same attitude.
I've had operational outages reach the top of everyone's news source/feed/newspaper/broadcast. This is an exceptionally unpleasant experience.
If you really think stability is top priority, then you cannot possibly think that the current deployment process is sane.
Every real business process reflects the history, organization, and people involved.
Right now you are placing the responsibility on the developers to make sure the site is stable, because any change they merge might break production since it is automatically sent out. If anything that gives the appearance that the operations team doesn't care about stability, and would rather wait until things break and revert them.
It is the responsibility of the operations team to ensure stability. Having to revert something because that's the only way production will be stable is not a proper workflow.
On the contrary, reverting things (from prod branch) because they destabilized production is normal procedure. Whether that's accomplished by frozen builds that are flexibly rolled to prod once they pass acceptable QA tests (more common in commercial service) and are rolled back out of prod should upgrade-related instability emerge, or reverts out of live prod branches in continuous deployment or similar environments (DevOps-y) depends on the underlying process.
This should not even be a debate. Questionable code in production shouldn't be. How that is accomplished is an implementation detail.