Re: [Wikitech-l] Release process

22 Oct 2010

On 10/21/10 4:04 PM, Aryeh Gregor wrote:
...
  On Thu, Oct 21, 2010 at 6:31 PM, Neil
Kandalgaonkar&lt;neilk(a)wikimedia.org&gt;  wrote:
  For what it's worth, I'm influenced by my
former job at Flickr, where
 the practice was to deploy several times *per day*, directly from trunk.
 That may be more extreme than we want  but be aware there are people who
 are doing it successfully -- it just takes a few extra development
 practices. 
 Personally, I think it would be awesome if we could migrate to this
 level of deployment frequency eventually.  I imagine that
 comprehensive automated test suites are a major part of making this
 reliable. 
Nope. Automated tests help a lot with this approach but Flickr doesn't 
have much better tests than MediaWiki does.

We *should* have better tests, but I would just say that it is not 
required for us to have a great test suite before doing this.

...
   To the extent you can share any details about how
stuff
 works at Flickr, what long-term changes are necessary for this to be
 practical? 
Flickr engineers have already talked a lot about this in public. See 
references below.

The main insight here is that branching is a bad way for a website to 
manage change. We do not have an install base that's out there in the 
world, like shrink-wrapped software, where we issue patches on CD. For a 
website, we control the entire install base.[1]

What we need are ways of managing change across our server clusters, or 
managing incremental feature and infrastructure upgrades. This leads to 
"branching in code".

Doing things the Flickr way entirely would require:

1 - A "feature flag" system, for "branching in code". The point is to

start developing a new feature with it being turned off by default for 
most environments and without succumbing to branching and merging 
misery. In other words, day one of a new feature looks like this:

   if ( $wgFeature['MyNewThing'] ) {
     /* ... new code ... */
   } else {
     /* ... old code ... */
   }

Of course if you're fixing bugs there's no need to hide that behind a 
feature flag.

2 - Every developer with commit access is thinking about deployment onto 
a cluster of machines all the time. Committing to the repository means 
you are asserting this will work in production. (This is the hard part 
for us, I think, but maybe not insurmountable).

3 - One can deploy with a single button press (and there is a system 
recording what changes were deployed and why, for ops' convenience).

4 - When there's trouble, new deploys can be blocked centrally, and then 
ops can revert to a previous version with a single button press.

5 - Developers are good about "cleaning up" code that was previously 
protected by feature flags once the behaviour is standard. (HINT: this 
is the part Flickr doesn't talk about in public... but as an open source 
project with more visible dirty laundry, perhaps we can do better.)

This system does result in more "oops" moments. But the point is to make 
those easy to recover from, and to have a culture where people aren't 
blamed too much for this. Not to make a system that tries to ensure that 
deploy branches can be tested to be almost perfect. The real problems 
are always things that nobody anticipated anyway.

NOTES

[1] I am for the purposes of the argument ignoring MediaWiki as a 
deliverable and only thinking about project websites.

REFERENCES

Here's the most concise presentation:
"Always Ship Trunk: Managing Change In Complex Websites" by Paul Hammond
http://www.paulhammond.org/2010/06/trunk/alwaysshiptrunk.pdf

And a longer talk about all this from Paul Hammond and John Allspaw
10+ Deploys Per Day: Dev/Ops Cooperation at Flickr
http://velocityconference.blip.tv/file/2284377/

Blog post about the Feature Flag system by Ross Harmes
"Flipping out"
http://code.flickr.com/blog/2009/12/02/flipping-out/

-- 
Neil Kandalgaonkar ( ) &lt;neilk(a)wikimedia.org&gt;

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Release process