[Foundation-l] New two-part schedule for 1.17 deployment

Rob Lanphier robla at wikimedia.org
Thu Feb 10 22:59:42 UTC 2011


Hi everyone,

In case you missed it on the techblog, here's an update on the revised
deployment plan for 1.17, part 1 of which starts in 7 hours:
http://techblog.wikimedia.org/2011/02/1-17deployment-attempt2/

Also copied below.

Rob
------

As covered on this blog this week, we had a few problems with our
initial deployment of 1.17 to the Wikimedia cluster of servers.  We’ve
investigated the problems, and believe we have fixed many of the
issues.  Some of the unsolved issues are complicated enough that the
only timely and reasonable way to investigate them is to deploy and
react, so we’ve come up with a plan that lets us do it in a safe way
by deploying on just a few wikis at a time (as opposed to all at once,
as we tried earlier).

We’re scheduling two deployment windows:

First window – This wave will be deployed between Friday, February 11,
6:00 UTC – 12:00 UTC (10pm PST Thursday, February 10 in San
Francisco).  This first wave will be to a limited set of wikis (see
below).
Second window – Wednesday February 16 (between 6:00 UTC – 12:00 UTC) –
full deployment (tentative)
Repeating what is new about 1.17:  There are many, many little fixes
and improvements (see the draft release notes for an exhaustive list),
as well as one larger improvement: Resource Loader.  Read more in the
previous 1.17 deployment announcement.


First window
This first deployment window will be to a limited set of wikis:

http://simple.wikipedia.org/ (simplewiki)
http://simple.wiktionary.org/ (simplewiktionary)
http://usability.wikimedia.org/ (usabilitywiki)
http://strategy.wikimedia.org/ (strategywiki)
http://meta.wikimedia.org/ (metawiki)
http://eo.wikipedia.org/ (eowiki)
http://en.wikiquote.org/ (enwikiquote)
http://en.wikinews.org/ (enwikinews)
http://en.wikibooks.org/ (enwikibooks)
http://beta.wikiversity.org (betawikiversity)
http://nl.wikipedia.org (nlwiki)
Note that the point of this first round of wikis being switched over
is to be able to observe the problem or problems without overloading
the site and bringing it down.  This deployment should be small enough
in scope that even if there are moderate performance problems, no one
should notice without watching our monitoring tools.  We may not roll
out to every wiki listed above during the first wave, but we plan to
roll out to enough of them that we can gather enough debugging
information to make the second wave (full deployment) go smoothly.

Second window
We will continue to roll this out to the rest of the wikis during this
window.  Depending on our confidence level, we may deploy to the
remaining wikis, or we may decide to deploy to a portion of the
remaining wikis.  If necessary, we will schedule another window to
finish the deployment.

Technical details
Here’s some more technical detail: one problem with the original
Tuesday deploy was that the cache miss rate went up quite
substantially.  We believe the problem was a problem with the
configuration of the $wgCacheEpoch variable, which caused more
aggressive culling of our cache than the servers could handle.  We
have made adjustments, and so this shouldn’t be a problem during our
next deployment attempt.

The $wgCacheEpoch problem explains some of the problems we had, but
not all of them.  Since we don’t have a clear explanation for all of
the problems, we plan to modify the way we deploy this software so
that we aren’t rolling this out to every wiki simultaneously.  As our
software is currently built, this isn’t easy to do in a general way,
but it turns out this release is suited to an incremental deployment.
(Note: we also plan to develop a more general capacity to roll out
incrementally for future releases).

Thank you for your patience!  We hope that this time around we can
deploy this in a way that you won’t notice anything other than the
improvements.




More information about the wikimedia-l mailing list