<quote name="Jon Robson" date="2014-03-07" time="09:30:09 -0800">
Let's also take this into a new thread. There are a lot of different conversations now going on....
My opinion is that fixing this with policy is going to be hard.
Either everyone who commits needs to be mindful of what day/time it is and whether or not another human has cut the new branch yet (which isn't set in stone on when, it varies by a couple hours, depending on a lot of factors), OR we modify the branch cut based on some arbitrary offset (24 hours ago) or some human looks at the merges and picks a point.
None of those are ideal/scalable.
What we should do, however, is have a true "deployment pipeline". Briefly defined: A deployment pipeline is a sequence of events that increase your confidence in the quality of any particular build/commit point.
A typical example is: commit -> unit tests -> integration tests -> manual tests -> release
Each step has the ability to fail a build, which means "You shall not pass!" to that commit point. The earlier you get a "You shall not pass!" the better because it means less time waiting by the developers to know if what they committed is ok or not.
What this means for us: The Mobile team is actually a good example. They are doing The Right Thing and have a lot of tests written, including browser tests. They run into problems when, eg: they write a new feature and associated test and commit it.
Beta Cluster gets that code (feature and test) within 5 minutes.
But, test.wikipedia and en.wikipedia get that feature much later, days later.
However, the test code is run by Jenkins across all environments (beta cluster, test.wikipedia, en.wikipedia etc) all the time. So, the mobile team gets a ton of false positives when their new test runs against eg production where the feature isn't enabled yet (on purpose).
The QA team is working on this problem now (loosely termed the "versioned test problem").
How a pipeline would help:
Really, a pipeline isn't a thing like your indoor plumping but more of a mindset/way of designing your test infrastructure. But, it means that you keep things self-contained (contrary to the mobile example above) and things progress through the pipeline in a predicable way/pace.
It also means that each code commit spends the exact same amount of time in the various stages as other code commits. Right now some code sits on Beta Cluster for 7 days before hitting production, whereas other code spends 0-5 minutes. That's not good.
Wanna help us on this problem? We're hiring: https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nHZ0zmw6 (2 job openings)
Greg