Hi everyone
I've just posted postmortem notes on the MediaWiki 1.17 release here: http://www.mediawiki.org/wiki/MediaWiki_1.17/Release_postmortem
...and since I expect there will be some editing/futzing with that page, I've included the full wikitext below. Also, I wouldn't be surprised if this generates some discussion on this list.
(start of wikitext):
We released [[MediaWiki 1.17]] on June 22. In the interests of doing better next time, a small group of us (Tim, Chad, Sam, Sumana, and RobLa) got together to brainstorm what went right and what we need to look at. [[User:RobLa-WMF|RobLa]] then summarized that discussion, and wrote this summary up. Any first person references are probably me (RobLa), and any references to "we" is probably the group above. See the history for this page for the raw notes.
Note: this is specifically about the MediaWiki 1.17.0 release, rather than the 1.17 deployment.
== Timeline ==
Here is the timeline, derived from SVN commit logs: * 2010-07-28 - MediaWiki 1.16.0 released * 2010-12-07 - REL1_17 branched. This is the branch that MediaWiki 1.17.0 was based on. * 2011-02-03 - 1.17wmf1 branched * 2011-05-05 - MediaWiki 1.17.0beta1 tagged * 2011-06-14 - MediaWiki 1.17.0rc1 released * 2011-06-22 - MediaWiki 1.17.0 released
== How it went ==
We started by brainstorming "what went well" and "what to look at". In the initial brainstorming, the original group had many more items in the "what to look at" section than in the "what went well". I then set about organizing things, and settled upon four categories: substance, polish, timing, and process. What became clear was that we felt pretty good about the substance and polish of the release (where positive and negatives balanced out pretty well), but the timing and process categories had the most that we needed to look at.
=== Substance and polish ===
As for the substance, it went very well. We had three large features (ResourceLoader, category sorting and the new installer) that complicated this release. As of this writing, it looks like these features are in pretty good shape, and we can be pretty proud of releasing them in the state that they're in. We fixed a lot of bugs (207 noted in the [[Release notes/1.17|release notes]), and made many smaller improvement to the codebase. Everyone was right to be very eager to get this release out.
Things of substance that didn't go so well: our PostgreSQL support suffered until quite late in the process, and our command line installer is incomplete in some frustrating ways. On PostgreSQL: the developers who fixed the last of the bugs aren't people that use PostgreSQL on a day-to-day basis. The folks that normally develop our PostgreSQL support had other engagements, and we don't have a very deep list of people to fall back on. We need to work out a plan for engaging PostgreSQL users as developers in this area, or it will be very difficult to continue support for this DB. The command line interface to the installer just needs a little more time to mature; there are many ways of solving this problem without delaying a release, but I won't get overly prescriptive in this writeup.
The polish of 1.17 was superb. The release notes were well-written, and there hasn't been an urgent need for a rapid 1.17.1 release. We'll do one anyway, since there were a couple of niggly bugs that can be fixed easily enough.
=== Timing ===
As noted, the biggest area for improvement is around the timing and release process. It wasn't all bad; we did (just barely) manage to keep the release cycle under one year. Still, that's much longer than our aspiration of quarterly releases, or even the previous historic norm of 2-3 releases per year. Moreover, it has been a long time since branching 1.17, so we already have seven months worth of work backed up for future releases. 1.18 was branched in early May, so in addition to the five months of changes we have backed up for that release, we already have two more months of changes backed up for 1.19.
The biggest thing that delayed this release (and the 1.17 deployment in March) was the code review backlog. That topic has been covered in many earlier threads, but a brief recap: after the 1.16 release, we fell way behind on code review, relying solely on Tim up until that point. We added more reviewers in October, which helped us get the backlog down to a reasonable level by December. We branched, finished off the 1.17-specific review, and deployed. Further minor review work was needed prior to the 1.17 release. With more Wikimedia Foundation developers spending 20% of their time on review, we're optimistic we'll be able to finish off the backlog and stay on top of the review process.
As we drew closer to the 1.17 release, we issued 1.17 beta 1. This beta unintentionally lasted several weeks as we tried to finish off the last of the release blockers. In particular, a security bug we worked on during this time created an awkward situation, since we had to iterate multiple times to fully plug the hole. The good news, though, is that the period was long enough for us to get some good end-user testing and bug reporting prior to the final release.
=== Process ===
Process is where we need the most work. The actual logistics of putting up the tarball and other bits are working well (these haven't changed in years), but everything leading up to that point could use a lot of streamlining.
The first issue is purely one of scoping. Right now, we're not terribly deliberate about what goes in and what is out. Part of the problem we have here is that opinions vary as to what a reasonable release interval is. The range of opinion seems to be anywhere from "multiple times a day" to "every six months". It's difficult to plan this without getting consensus on this point, and it's difficult to get consensus without first proving that we can get on top of the code review backlog and stay on top of it. If we go with a longer cycle, we can consider adopting a process similar to GNOME<ref>Example of GNOME release timeline: http://live.gnome.org/ThreePointOne</ref> or Ubuntu or other project that has a good track record for sticking with a regular releases. The most interesting practices there involve having clear deadlines for proposing new features, deadlines for features being done or pulled, and other date-risk mitigation strategies.
As with the code review process last year, this year, we're probably too reliant on Tim to not only drive but execute many steps. One way we can speed up the process is to document it, making it clear where we are in the process, and more importantly, how people can help. "Help" can mean explicitly doing the work, but it can also be simply "don't do things that delay the release further", or "stop others from delaying the release". We have a wonderful [[Release checklist]], but that list was too focused on the last steps before the release. Many steps before the actual publication of the tarball were missing, so they've been added into that document. More work can be done there. Additionally, we will probably experiment with other team members (e.g. Chad) performing at least alpha or beta releases.
During this release, we tagged many things "1.17" for backporting to trunk. This process was useful, as long as people remember to untag once they've merged. There was some confusion at various times who was responsible for doing this work. It switched sometimes between Roan, Chad, Tim and others. Additionally, pretty much everyone felt empowered to tag things for backporting, but there probably wasn't enough discipline in trimming that list back before actually making the change. Some unreviewed changes were backported (or directly applied) to the release branch, causing confusion and delay. We have a policy about backporting <ref>http://www.mediawiki.org/wiki/Commit_access_requests#Guidelines_for_applying... - bullet points 4 & 5</ref>, but that policy wasn't followed very closely.
The process of finding release notes that weren't added and then backporting them was work that could have been done by people other than Tim, but Tim ended up doing most of this. This is work that needs to happen sooner in the process in a more distributed fashion. Additionally, one way to avoid this extra work is to keep backporting to a minimum in the first place.
This gets to the larger issue of communication and momentum at the end of this process. With timezone differences, it's not sustainable to have daily scrums all of the time, but having scrums during the last couple of weeks or so in the process may help keep things moving to the end.
== Recommendations == This section is intentionally left unfinished. The goal of this was to establish and document what happened. To the extent anything is incorrect or misleading above, corrections are encouraged. Recommendations for new things to try based on lessons learned from this release should be included below:
* ''your recommendation here''
...and possibly discussed on the talk page (suggestions above may be ruthlessly edited; talk page is better for attribution and preservation).
== References ==
<references/>
Rob Lanphier wrote:
As noted, the biggest area for improvement is around the timing and release process. It wasn't all bad; we did (just barely) manage to keep the release cycle under one year. Still, that's much longer than our aspiration of quarterly releases, or even the previous historic norm of 2-3 releases per year. Moreover, it has been a long time since branching 1.17, so we already have seven months worth of work backed up for future releases. 1.18 was branched in early May, so in addition to the five months of changes we have backed up for that release, we already have two more months of changes backed up for 1.19.
[...]
The first issue is purely one of scoping. Right now, we're not terribly deliberate about what goes in and what is out. Part of the problem we have here is that opinions vary as to what a reasonable release interval is. The range of opinion seems to be anywhere from "multiple times a day" to "every six months". It's difficult to plan this without getting consensus on this point, and it's difficult to get consensus without first proving that we can get on top of the code review backlog and stay on top of it. If we go with a longer cycle, we can consider adopting a process similar to GNOME<ref>Example of GNOME release timeline: http://live.gnome.org/ThreePointOne</ref> or Ubuntu or other project that has a good track record for sticking with a regular releases. The most interesting practices there involve having clear deadlines for proposing new features, deadlines for features being done or pulled, and other date-risk mitigation strategies.
Thank you for writing all of this up. It looks like it probably took quite a bit of time, and I appreciate it.
I pulled out two paragraphs that seem to be the nuggets. Without having this thread devolve into another chase-your-tail thread, I'd say that the main issue is that the release manager for 1.17 has a much more conservative approach, and when looking at it from that lens, 1.17 was right on time.
Tim has outlined on this mailing list why he believes that more infrequent releases are better, and his arguments are not necessarily invalid, I just don't think they have any consensus behind them. I think Wikimedia and other MediaWiki users would like a faster release process. But that's _completely irrelevant_ when it's one person doing the work and putting together the final release.
That, in a nutshell, seems to be the point of contention. The release (and deployment!) timelines are perfectly aligned with a conservative approach, but a lot of others (Brion, Neil, Chad, Roan, and in some ways Erik, among others) have recommended a less conservative approach ("perfect is the enemy of the done") that I believe would keep end-users and developers much happier.
There's been a recent change-up in Wikimedia staffing, so I don't know who will be managing the 1.18 release, but if it's the same person, my bet is that it's going to take the same amount of time. In my view, a few people (one?) see the longer release/deployment period as a feature, while the majority of people see it as a bug. :-)
MZMcBride
On 7 July 2011 20:55, MZMcBride z@mzmcbride.com wrote:
Tim has outlined on this mailing list why he believes that more infrequent releases are better, and his arguments are not necessarily invalid, I just don't think they have any consensus behind them. I think Wikimedia and other MediaWiki users would like a faster release process. But that's _completely irrelevant_ when it's one person doing the work and putting together the final release.
Are we talking about WMF deployments or tarballs here? Speaking as a tarball user, 2 releases a year, maybe 3, is *just fine*.
- d.
On Thu, Jul 7, 2011 at 10:02 PM, David Gerard dgerard@gmail.com wrote:
Are we talking about WMF deployments or tarballs here? Speaking as a tarball user, 2 releases a year, maybe 3, is *just fine*.
I think 3 releases per year is fine. However, I think we should deploy to WMF sites much more often than that. That's basically been my position throughout this debate.
Roan
On Thu, Jul 7, 2011 at 10:18 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
On Thu, Jul 7, 2011 at 10:02 PM, David Gerard dgerard@gmail.com wrote:
Are we talking about WMF deployments or tarballs here? Speaking as a tarball user, 2 releases a year, maybe 3, is *just fine*.
I think 3 releases per year is fine. However, I think we should deploy to WMF sites much more often than that. That's basically been my position throughout this debate.
+1
David Gerard wrote:
On 7 July 2011 20:55, MZMcBride z@mzmcbride.com wrote:
Tim has outlined on this mailing list why he believes that more infrequent releases are better, and his arguments are not necessarily invalid, I just don't think they have any consensus behind them. I think Wikimedia and other MediaWiki users would like a faster release process. But that's _completely irrelevant_ when it's one person doing the work and putting together the final release.
Are we talking about WMF deployments or tarballs here? Speaking as a tarball user, 2 releases a year, maybe 3, is *just fine*.
As far as I'm aware, tarball releases and Wikimedia deployments have largely shifted to being at approximately the same (slower) pace, but they're not synchronized. But you're absolutely right that there's no need for that to be the case.
I'm muddying the waters a bit by discussing both releases and deployments at once, and for that I apologize. That said, they are obviously interconnected. Ideally you want code (Wikimedia deployments) that has been run in the wild for a while in order to catch issues that would never be caught in development. That makes for a better tarball release.
In this case, you also largely have the same person filling both roles (currently? I don't know). That is, Tim was the 1.17 release manager and he was the point-person doing the 1.17 deployment, as far as I remember, at least. As I said in my previous post, there have been some shifts in job titles (cf. Erik's e-mail a few weeks ago), which I think correlate to some shifts in job responsibilities, but that's still unclear to me.
For what it's worth, I agree that two or three tarball releases per year would be fine, that just means getting Wikimedia deployments off of the same schedule.
MZMcBride
On 07/07/11 03:42, Rob Lanphier wrote:
http://live.gnome.org/ThreePointOne <snip> having clear deadlines for proposing new features, deadlines for features being done or pulled, and other date-risk mitigation strategies.
Having a roadmap like Gnome is the way I am advocating.
Another way I could consider is having a stable branch and only merge in stable/reviewed patches. After each merge you can either: - hold for more patches - release on live site - tag a release (beta, RC...) This path is probably as predictable as the first one. Its drawback is that new features might have less attention.
Anyway, both ways are *very* far away from our wiki-way of handling /trunk/ (which is messy).
On 07/07/11 03:42, Rob Lanphier wrote:
we're probably too reliant on Tim to not only drive but execute many steps.
<snip>
Additionally, we will probably experiment with other team members (e.g. Chad) performing at least alpha or beta releases.
This is actually a great way to train new people. Let Chad takes the release management cycle, make sure Tim is around though or next release he will have to be trained by Chad :-b
wikitech-l@lists.wikimedia.org