Hi Erik,
I'm not a fan of removing one of the stages of our current deployments. More inline:
On Fri, Oct 18, 2013 at 3:26 PM, Erik Moeller erik@wikimedia.org wrote:
Option B: No Monday deploy. This would mean we'd have to improve our testing process to catch issues affecting the non-Wikipedia wikis before they hit production. I personally think getting rid of the Monday deploy could create some _desirable_ pain that would act as a forcing function to improve pre-release test practices, rather than using production wikis to test.
At the same time, we'd have a full week to work out the kinks we find in testing before they hit any production wiki, and could have a more systematic process of backing out changes if needed prior to deployment.
The Monday deploy is where we catch load based issues in a way that's not absolutely crushing. The cumulative traffic of the wikis is approximately 10% of our overall traffic, which is large enough to notice load-based problems, but small enough to make the difference between "hmm, we seem to have a load issue" to "oh crap, we just brought down the site".
We also generally discover many more issues through getting it in front of more people, but not foisting it on everyone. It's not great that there are bugs that some people have to suffer through, but it's better than making all people suffer through them. We can change the mix of wikis so that it's not always the same set that's part of the pilot group (and maybe one day in the glorious future be able to do mixed versioning on a per-wiki basis so that people could opt-in), but I'd rather not foist everything on everyone at once.
Finally, another advantage of staging things this way is that we get some time to focus on non-Wikipedia sister project bugs before we deploy to Wikipedia. There are often project-specific bugs, and our test infrastructure isn't *nearly* built out enough to catch even the majority of them. If we deploy to all projects at once, we get hit with all of the bugs at once.
Option C: Shift Monday deploys to Tuesday. This would at least give us an additional work day to fix issues that have occurred in testing before they hit prod. I personally don't think this goes far enough, but might be a useful tweak to make if option B seems too problematic.
I like this option. U.S. Holidays (and holidays observed by a significant chunk of key WMF employees) often fall on Monday, which means we often have to reschedule these for Tuesday anyway.
Rob