Hi Erik,
I'm not a fan of removing one of the stages of our current deployments.
More inline:
On Fri, Oct 18, 2013 at 3:26 PM, Erik Moeller <erik(a)wikimedia.org> wrote:
Option B: No Monday deploy. This would mean we'd
have to improve our
testing process to catch issues affecting the non-Wikipedia wikis before
they hit production. I personally think getting rid of the Monday deploy
could create some _desirable_ pain that would act as a forcing function to
improve pre-release test practices, rather than using production wikis to
test.
At the same time, we'd have a full week to work out the kinks we find in
testing before they hit any production wiki, and could have a more
systematic process of backing out changes if needed prior to deployment.
The Monday deploy is where we catch load based issues in a way that's not
absolutely crushing. The cumulative traffic of the wikis is approximately
10% of our overall traffic, which is large enough to notice load-based
problems, but small enough to make the difference between "hmm, we seem to
have a load issue" to "oh crap, we just brought down the site".
We also generally discover many more issues through getting it in front of
more people, but not foisting it on everyone. It's not great that there
are bugs that some people have to suffer through, but it's better than
making all people suffer through them. We can change the mix of wikis so
that it's not always the same set that's part of the pilot group (and maybe
one day in the glorious future be able to do mixed versioning on a per-wiki
basis so that people could opt-in), but I'd rather not foist everything on
everyone at once.
Finally, another advantage of staging things this way is that we get some
time to focus on non-Wikipedia sister project bugs before we deploy to
Wikipedia. There are often project-specific bugs, and our test
infrastructure isn't *nearly* built out enough to catch even the majority
of them. If we deploy to all projects at once, we get hit with all of the
bugs at once.
Option C: Shift Monday deploys to Tuesday. This would
at least give us an
additional work day to fix issues that have occurred in testing before they
hit prod. I personally don't think this goes far enough, but might be a
useful tweak to make if option B seems too problematic.
I like this option. U.S. Holidays (and holidays observed by a significant
chunk of key WMF employees) often fall on Monday, which means we often have
to reschedule these for Tuesday anyway.
Rob