Hi there,
tldr; I like a modified Option C, but also propose a very different Option D that I think would also be good, either now or as the next next step.
<quote name="Erik Moeller" date="2013-10-18" time="15:26:16 -0700">
[snip overview of problem, combined with Robla's and you get a good picture of the issues.]
== Some options ==
Option A: Change nothing. I've not heard from enough folks to see if the problems above are widely perceived to _be_ problems. If the consensus is that current practice, for now, is the best possible approach, obviously we should stick with it.
I think this is a non-option, honestly. The current schedule has issues that can be resolved; let's try to resolve them.
Option B: No Monday deploy. This would mean we'd have to improve our testing process to catch issues affecting the non-Wikipedia wikis before they hit production. I personally think getting rid of the Monday deploy could create some _desirable_ pain that would act as a forcing function to improve pre-release test practices, rather than using production wikis to test.
At the same time, we'd have a full week to work out the kinks we find in testing before they hit any production wiki, and could have a more systematic process of backing out changes if needed prior to deployment.
Due to the concerns raised by Robla (and I, when in person), I'm not sure this is the right way to go next. It might be an option later when our cycle is a matter of a day or two, but not now with the week-long cycle.
Option C: Shift Monday deploys to Tuesday. This would at least give us an additional work day to fix issues that have occurred in testing before they hit prod. I personally don't think this goes far enough, but might be a useful tweak to make if option B seems too problematic.
I like this option as a next step, but with a caveat/suggestion: we mix up the wikis in stage 0, 1, and 2. And, we should be open to changing the mix more frequently and based on community feedback (I know some are actually willing/wanting to join the fun of being earlier in the cycle...).
Until we have a way to gradually increase the % of users who are using the new wmf *cross wiki*, then our only option is doing things per wiki, which gives you two conceptual options: a test/production split, and that's it, or a tiered system like the 3-tier one we have now.
I have two suggestions; a safe one and a less safe one (where 'safe' being 'easy to sell to people'):
1) the safe one: We move Monday's deploy to Tuesday. Let some wikis move into phase 1 from phase 2, and some move from phase 1 to phase 2 (but probably keep phase 0 the same unless some community is as crazy as mw.org's ;) ).
This will give more agency to communities on their placement in the cycle while still giving us a more thorough load test on Tuesday after blatant issues are found on Thur/Fri.
2) the less safe one (Option D): We have a four-tiered system.
tier0 on Mon, tier1 on Tue, tier2 on Wed, tier3 on Thurs, on Friday we rest (er, merge into master for Monday). Ideal breakdown of user load (of total cross cluster) would be something like: tier0:5% (5% total) tier1:20% (25% total) tier2:30% (55% total) tier3:45% (100%)
This gives us: increasing load, with more measurable moments in time. What I mean by that is: With Ori's awesome new work (and planned work), we'll be able to make more sense of performance/load pre/post a deploy. We already look at 500s and similar logs, but those are lumped in the 'apparent bugs' that are found right after a deploy (along with obvious "this button went missing" things). With only a 3 tier system, where the first tier is basically so small it is hard to tell signal from noise in pre/post deploy performance data. We still only get one chance to test load (tier1, non-wikipedias now) before going everywhere and potentially having downtime.
I argue/theorize, that with 3 deploys before we get to everywhere, we would be better able to spot performance issues.
Now, we can't probably do that idealized load distribution I lay out above. See: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyAllProjectsOriginal.htm for the breakdown per project type. Also (for the Wikpedia's breakdown): http://stats.wikimedia.org/EN/TablesPageViewsMonthlyOriginalCombined.htm
<insert time where Greg goes off to sift through data>
Ok, I'm going to have to sit down with this data on Monday (this current naptime session won't be long enough) and come back with a proposed distribution. Simply: I'll try to hit the above idealized breakdown, but with these restrictions: A) ENWP in tier3 (which is 44% by itself, using Sept'13 data); B) for tiers 1 and 2, get a mix of project types (ie: include WPs, wikibookos, wiktionaries, etc in both); and C) tier0 being only testwikis (and mw.org). But leave this open for others to join, if desired.
Other benefits of Option D: * gets us accustomed to more frequent deploys. * will provide some of that beneficial pain Erik mentions (which is something I want as well, but only if intelligently planned pain) * Is easier to conceptually understand (a growing release each week, with Fridays off). We'd of course have a page per tier with the current list of wikis in that tier (shouldn't change all that often) so people can answer "is X language project on the new release yet?". * Obvious next step towards continuous from here is 2 day cycles twice a week, which is basically Option B on steroids.
== CONCLUSION! ==
If Option D doesn't sit well with people, let's go with a modified Option C.
Ok, wall of text is sufficiently long...
Greg