Hi everyone,
In case you missed it on the techblog, here's an update on the revised deployment plan for 1.17, part 1 of which starts in 7 hours: http://techblog.wikimedia.org/2011/02/1-17deployment-attempt2/
Also copied below.
Rob ------
As covered on this blog this week, we had a few problems with our initial deployment of 1.17 to the Wikimedia cluster of servers. We’ve investigated the problems, and believe we have fixed many of the issues. Some of the unsolved issues are complicated enough that the only timely and reasonable way to investigate them is to deploy and react, so we’ve come up with a plan that lets us do it in a safe way by deploying on just a few wikis at a time (as opposed to all at once, as we tried earlier).
We’re scheduling two deployment windows:
First window – This wave will be deployed between Friday, February 11, 6:00 UTC – 12:00 UTC (10pm PST Thursday, February 10 in San Francisco). This first wave will be to a limited set of wikis (see below). Second window – Wednesday February 16 (between 6:00 UTC – 12:00 UTC) – full deployment (tentative) Repeating what is new about 1.17: There are many, many little fixes and improvements (see the draft release notes for an exhaustive list), as well as one larger improvement: Resource Loader. Read more in the previous 1.17 deployment announcement.
First window This first deployment window will be to a limited set of wikis:
http://simple.wikipedia.org/ (simplewiki) http://simple.wiktionary.org/ (simplewiktionary) http://usability.wikimedia.org/ (usabilitywiki) http://strategy.wikimedia.org/ (strategywiki) http://meta.wikimedia.org/ (metawiki) http://eo.wikipedia.org/ (eowiki) http://en.wikiquote.org/ (enwikiquote) http://en.wikinews.org/ (enwikinews) http://en.wikibooks.org/ (enwikibooks) http://beta.wikiversity.org (betawikiversity) http://nl.wikipedia.org (nlwiki) Note that the point of this first round of wikis being switched over is to be able to observe the problem or problems without overloading the site and bringing it down. This deployment should be small enough in scope that even if there are moderate performance problems, no one should notice without watching our monitoring tools. We may not roll out to every wiki listed above during the first wave, but we plan to roll out to enough of them that we can gather enough debugging information to make the second wave (full deployment) go smoothly.
Second window We will continue to roll this out to the rest of the wikis during this window. Depending on our confidence level, we may deploy to the remaining wikis, or we may decide to deploy to a portion of the remaining wikis. If necessary, we will schedule another window to finish the deployment.
Technical details Here’s some more technical detail: one problem with the original Tuesday deploy was that the cache miss rate went up quite substantially. We believe the problem was a problem with the configuration of the $wgCacheEpoch variable, which caused more aggressive culling of our cache than the servers could handle. We have made adjustments, and so this shouldn’t be a problem during our next deployment attempt.
The $wgCacheEpoch problem explains some of the problems we had, but not all of them. Since we don’t have a clear explanation for all of the problems, we plan to modify the way we deploy this software so that we aren’t rolling this out to every wiki simultaneously. As our software is currently built, this isn’t easy to do in a general way, but it turns out this release is suited to an incremental deployment. (Note: we also plan to develop a more general capacity to roll out incrementally for future releases).
Thank you for your patience! We hope that this time around we can deploy this in a way that you won’t notice anything other than the improvements.