On 2013-10-18 9:40 PM, "Rob Lanphier" robla@wikimedia.org wrote:
Hi Erik,
I'm not a fan of removing one of the stages of our current deployments. More inline:
On Fri, Oct 18, 2013 at 3:26 PM, Erik Moeller erik@wikimedia.org wrote:
Option B: No Monday deploy. This would mean we'd have to improve our testing process to catch issues affecting the non-Wikipedia wikis before they hit production. I personally think getting rid of the Monday deploy could create some _desirable_ pain that would act as a forcing function
to
improve pre-release test practices, rather than using production wikis
to
test.
At the same time, we'd have a full week to work out the kinks we find in testing before they hit any production wiki, and could have a more systematic process of backing out changes if needed prior to deployment.
The Monday deploy is where we catch load based issues in a way that's not absolutely crushing. The cumulative traffic of the wikis is approximately 10% of our overall traffic, which is large enough to notice load-based problems, but small enough to make the difference between "hmm, we seem to have a load issue" to "oh crap, we just brought down the site".
We also generally discover many more issues through getting it in front of more people, but not foisting it on everyone. It's not great that there are bugs that some people have to suffer through, but it's better than making all people suffer through them. We can change the mix of wikis so that it's not always the same set that's part of the pilot group (and
maybe
one day in the glorious future be able to do mixed versioning on a
per-wiki
basis so that people could opt-in), but I'd rather not foist everything on everyone at once.
Finally, another advantage of staging things this way is that we get some time to focus on non-Wikipedia sister project bugs before we deploy to Wikipedia. There are often project-specific bugs, and our test infrastructure isn't *nearly* built out enough to catch even the majority of them. If we deploy to all projects at once, we get hit with all of the bugs at once.
Option C: Shift Monday deploys to Tuesday. This would at least give us
an
additional work day to fix issues that have occurred in testing before
they
hit prod. I personally don't think this goes far enough, but might be a useful tweak to make if option B seems too problematic.
I like this option. U.S. Holidays (and holidays observed by a significant chunk of key WMF employees) often fall on Monday, which means we often
have
to reschedule these for Tuesday anyway.
Rob _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Tuesdays are also nice as that gives a day for bugs filed by a user on a weekend to be found/triaged by someone, and the correct person notified before the next stage of deploy.
As a user I have vauge memories of the site going down much more often in the past due to performance issues, which doesn't seem to happen anymore with the split up deploy.
Our ability to do effective load testing prior to a deploy is essentially zero other than reading code afaik, and I have yet to hear any proposals to change that. I don't think the pain points caused would actually get fixed. (Ok, I guess comparing profiling data of the testwikis before and after deploy carefully can reveal performance issues, but I still think one has to actually test with high load to see the high load issues)
-bawolff