Hi there,
tldr; I like a modified Option C, but also propose a very different
Option D that I think would also be good, either now or as the next next
step.
<quote name="Erik Moeller" date="2013-10-18" time="15:26:16
-0700">
[snip overview of problem, combined with Robla's and you get a good
picture of the issues.]
== Some options ==
Option A: Change nothing. I've not heard from enough folks to see if the
problems above are widely perceived to _be_ problems. If the consensus is
that current practice, for now, is the best possible approach, obviously we
should stick with it.
I think this is a non-option, honestly. The current schedule has issues
that can be resolved; let's try to resolve them.
Option B: No Monday deploy. This would mean we'd
have to improve our
testing process to catch issues affecting the non-Wikipedia wikis before
they hit production. I personally think getting rid of the Monday deploy
could create some _desirable_ pain that would act as a forcing function to
improve pre-release test practices, rather than using production wikis to
test.
At the same time, we'd have a full week to work out the kinks we find in
testing before they hit any production wiki, and could have a more
systematic process of backing out changes if needed prior to deployment.
Due to the concerns raised by Robla (and I, when in person), I'm not
sure this is the right way to go next. It might be an option later when
our cycle is a matter of a day or two, but not now with the week-long
cycle.
Option C: Shift Monday deploys to Tuesday. This would
at least give us an
additional work day to fix issues that have occurred in testing before they
hit prod. I personally don't think this goes far enough, but might be a
useful tweak to make if option B seems too problematic.
I like this option as a next step, but with a caveat/suggestion: we mix
up the wikis in stage 0, 1, and 2. And, we should be open to changing
the mix more frequently and based on community feedback (I know some are
actually willing/wanting to join the fun of being earlier in the
cycle...).
Until we have a way to gradually increase the % of users who are using
the new wmf *cross wiki*, then our only option is doing things per wiki,
which gives you two conceptual options: a test/production split, and
that's it, or a tiered system like the 3-tier one we have now.
I have two suggestions; a safe one and a less safe one (where 'safe'
being 'easy to sell to people'):
1) the safe one:
We move Monday's deploy to Tuesday. Let some wikis move into phase 1
from phase 2, and some move from phase 1 to phase 2 (but probably keep
phase 0 the same unless some community is as crazy as mw.org's ;) ).
This will give more agency to communities on their placement in the
cycle while still giving us a more thorough load test on Tuesday after
blatant issues are found on Thur/Fri.
2) the less safe one (Option D):
We have a four-tiered system.
tier0 on Mon, tier1 on Tue, tier2 on Wed, tier3 on Thurs, on Friday we
rest (er, merge into master for Monday). Ideal breakdown of user load
(of total cross cluster) would be something like:
tier0:5% (5% total)
tier1:20% (25% total)
tier2:30% (55% total)
tier3:45% (100%)
This gives us: increasing load, with more measurable moments in time.
What I mean by that is: With Ori's awesome new work (and planned work),
we'll be able to make more sense of performance/load pre/post a deploy.
We already look at 500s and similar logs, but those are lumped in the
'apparent bugs' that are found right after a deploy (along with obvious
"this button went missing" things). With only a 3 tier system, where the
first tier is basically so small it is hard to tell signal from noise in
pre/post deploy performance data. We still only get one chance to test
load (tier1, non-wikipedias now) before going everywhere and potentially
having downtime.
I argue/theorize, that with 3 deploys before we get to everywhere, we
would be better able to spot performance issues.
Now, we can't probably do that idealized load distribution I lay out
above. See:
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyAllProjectsOriginal.htm
for the breakdown per project type.
Also (for the Wikpedia's breakdown):
http://stats.wikimedia.org/EN/TablesPageViewsMonthlyOriginalCombined.htm
<insert time where Greg goes off to sift through data>
Ok, I'm going to have to sit down with this data on Monday (this current
naptime session won't be long enough) and come back with a proposed
distribution. Simply: I'll try to hit the above idealized breakdown, but
with these restrictions:
A) ENWP in tier3 (which is 44% by itself, using Sept'13 data);
B) for tiers 1 and 2, get a mix of project types (ie: include WPs,
wikibookos, wiktionaries, etc in both); and
C) tier0 being only testwikis (and
mw.org). But leave this open for
others to join, if desired.
Other benefits of Option D:
* gets us accustomed to more frequent deploys.
* will provide some of that beneficial pain Erik mentions (which
is something I want as well, but only if intelligently planned pain)
* Is easier to conceptually understand (a growing release each week,
with Fridays off). We'd of course have a page per tier with the
current list of wikis in that tier (shouldn't change all that often)
so people can answer "is X language project on the new release yet?".
* Obvious next step towards continuous from here is 2 day cycles twice a
week, which is basically Option B on steroids.
== CONCLUSION! ==
If Option D doesn't sit well with people, let's go with a modified
Option C.
Ok, wall of text is sufficiently long...
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |