Hi everyone,
Here's where we are with the 1.18 code review and deployment.
Short version:
* Het Deploy almost done
* 482 revs for code review. September 16 completion?
* Question: do we have to wait until code review is done before
deploying to test2?
* Question: could Roan, Tim, Brion and others in Haifa get together
and decide a reasonable target date for deployment? (probably late
September sometime)
Long version:
With code review, many WMF employees have been traveling to
conferences (first OSCON, then Wikimania), so that's put a big hole in
our capacity, which shows on the graph:
http://toolserver.org/~robla/crstats/crstats.118all.html
I've run some very rough numbers over the past month to get a sense of
how fast we're reviewing. Here's some different ways of loooking at
it:
* Worst 7 day performance over the past month: 30 revs/week (4/day)
* Best 7 day performance: 320 revs/week (46/day)
* Median 7 day performance from 6/29 to today: 114 revs/week (15/day)
* Past 30 days: 367 revs (12/day)
As of midnight UTC on August 2, we had 482 revisions to review.
Assuming we stay on a slow pace through Wikimania (4/day), then pick
up to 12/day starting August 8, that means we'll get through the queue
around September 16. If we do a little better (15/day), that's
September 8. If we stay at 4/day, it gets grim (December 1). And,
just for completeness, if we miraculously pick up to 46/day, that's
August 19. Friday, September 16 seems reasonably aggressive while
being realistic for a finish date on that activity.
Note that doesn't include revisions marked "fixme". For that list,
you can look here:
http://www.mediawiki.org/wiki/MediaWiki_roadmap/1.18/Revision_report#Fixmes
On the deployment side, Aaron Schulz has taken over work on
Heterogeneous Deploy:
http://www.mediawiki.org/wiki/Heterogeneous_deployment
The big development tasks (that we know of) are done. The work Aaron
has been doing is deploying bits and pieces at a time, then fixing the
bits that break, then deploying more. We had hoped we could get a
test instance of 1.18 deployed before Wikimania, but we hit a few
snags in finishing up. There are a few bits of deployment lore that
Aaron still needs to learn about in order to finish things off, but
assuming he's able to do that and get knowledgeable review of some of
the riskier code, he should be able to have the Het Deploy
infrastructure in place by the end of next week.
The cool thing about Het Deploy is that, once completed, we should be
able to deploy MediaWiki 1.18 to a much more production-like test wiki
than we've been able to use in the past (e.g.
prototype.wikimedia.org), and have it in test for much longer than our
current "test.wikimedia.org" site. We'll be able to deploy to both
"test.wikimedia.org" and/or "test2.wikimedia.org" prior to releasing
to the rest of the cluster. Even better, we'll be able to work with a
few pioneering wikis out there to deploy 1.18 to those sites early,
then gradually roll it out to a wider audience after we've worked out
the kinks with the initial deploys.
There is still some reasonable hesitation in deploying 1.18 even to
test or test2 prior to fully reviewing the code. I haven't talked to
all of the major production cluster guardians about this, but there is
a school of thought that suggests that perhaps we can get away with a
lighter skimming prior to deploying everywhere. If it is safe enough
to get something out to test2 prior to full review being done, I'd
love to do it. Question for those with deployment access: what are
your thoughts?
I spoke with Roan before he left San Francisco, and (I think) he
agreed to wrangle folks in Haifa to decide on a date, in part based on
the data that I just presented. So, I'm hoping he'll get a chance to
do that while he's there. We should plan to have 1.18 running
reasonably well on test2 for at least 3-4 days before we start
deploying to actual production wikis, and we should hold off on
deploying to the big wikis (enwiki, dewiki, etc) for another 3-4 days.
So, we should probably build at least a week and probably two into
the schedule for that. At the risk of biasing everyone, here's my
suggestion:
* Monday, September 19: deploy to test2
* Thursday, September 22: deploy to a few lower traffic pilot wikis
* Monday, September 26: deploy everywhere
That's pretty aggressive given that we're almost certainly going to
find bugs during the test2 deployment. I'd feel a lot more confident
about a full September 26 rollout if we can get something pushed to
test2 sooner than September 19.
Thoughts?
Rob