Hi everyone
I've just posted postmortem notes on the MediaWiki 1.17 release here:
http://www.mediawiki.org/wiki/MediaWiki_1.17/Release_postmortem
...and since I expect there will be some editing/futzing with that
page, I've included the full wikitext below. Also, I wouldn't be
surprised if this generates some discussion on this list.
(start of wikitext):
We released [[MediaWiki 1.17]] on June 22. In the interests of doing
better next time, a small group of us (Tim, Chad, Sam, Sumana, and
RobLa) got together to brainstorm what went right and what we need to
look at. [[User:RobLa-WMF|RobLa]] then summarized that discussion,
and wrote this summary up. Any first person references are probably
me (RobLa), and any references to "we" is probably the group above.
See the history for this page for the raw notes.
Note: this is specifically about the MediaWiki 1.17.0 release, rather
than the 1.17 deployment.
== Timeline ==
Here is the timeline, derived from SVN commit logs:
* 2010-07-28 - MediaWiki 1.16.0 released
* 2010-12-07 - REL1_17 branched. This is the branch that MediaWiki
1.17.0 was based on.
* 2011-02-03 - 1.17wmf1 branched
* 2011-05-05 - MediaWiki 1.17.0beta1 tagged
* 2011-06-14 - MediaWiki 1.17.0rc1 released
* 2011-06-22 - MediaWiki 1.17.0 released
== How it went ==
We started by brainstorming "what went well" and "what to look at".
In the initial brainstorming, the original group had many more items
in the "what to look at" section than in the "what went well". I
then set about organizing things, and settled upon four categories:
substance, polish, timing, and process. What became clear was that we
felt pretty good about the substance and polish of the release (where
positive and negatives balanced out pretty well), but the timing and
process categories had the most that we needed to look at.
=== Substance and polish ===
As for the substance, it went very well. We had three large features
(ResourceLoader, category sorting and the new installer) that
complicated this release. As of this writing, it looks like these
features are in pretty good shape, and we can be pretty proud of
releasing them in the state that they're in. We fixed a lot of bugs
(207 noted in the [[Release notes/1.17|release notes]), and made many
smaller improvement to the codebase. Everyone was right to be very
eager to get this release out.
Things of substance that didn't go so well: our PostgreSQL support
suffered until quite late in the process, and our command line
installer is incomplete in some frustrating ways. On PostgreSQL: the
developers who fixed the last of the bugs aren't people that use
PostgreSQL on a day-to-day basis. The folks that normally develop our
PostgreSQL support had other engagements, and we don't have a very
deep list of people to fall back on. We need to work out a plan for
engaging PostgreSQL users as developers in this area, or it will be
very difficult to continue support for this DB. The command line
interface to the installer just needs a little more time to mature;
there are many ways of solving this problem without delaying a
release, but I won't get overly prescriptive in this writeup.
The polish of 1.17 was superb. The release notes were well-written,
and there hasn't been an urgent need for a rapid 1.17.1 release.
We'll do one anyway, since there were a couple of niggly bugs that can
be fixed easily enough.
=== Timing ===
As noted, the biggest area for improvement is around the timing and
release process. It wasn't all bad; we did (just barely) manage to
keep the release cycle under one year. Still, that's much longer than
our aspiration of quarterly releases, or even the previous historic
norm of 2-3 releases per year. Moreover, it has been a long time
since branching 1.17, so we already have seven months worth of work
backed up for future releases. 1.18 was branched in early May, so in
addition to the five months of changes we have backed up for that
release, we already have two more months of changes backed up for
1.19.
The biggest thing that delayed this release (and the 1.17 deployment
in March) was the code review backlog. That topic has been covered in
many earlier threads, but a brief recap: after the 1.16 release, we
fell way behind on code review, relying solely on Tim up until that
point. We added more reviewers in October, which helped us get the
backlog down to a reasonable level by December. We branched, finished
off the 1.17-specific review, and deployed. Further minor review work
was needed prior to the 1.17 release. With more Wikimedia Foundation
developers spending 20% of their time on review, we're optimistic
we'll be able to finish off the backlog and stay on top of the review
process.
As we drew closer to the 1.17 release, we issued 1.17 beta 1. This
beta unintentionally lasted several weeks as we tried to finish off
the last of the release blockers. In particular, a security bug we
worked on during this time created an awkward situation, since we had
to iterate multiple times to fully plug the hole. The good news,
though, is that the period was long enough for us to get some good
end-user testing and bug reporting prior to the final release.
=== Process ===
Process is where we need the most work. The actual logistics of
putting up the tarball and other bits are working well (these haven't
changed in years), but everything leading up to that point could use a
lot of streamlining.
The first issue is purely one of scoping. Right now, we're not
terribly deliberate about what goes in and what is out. Part of the
problem we have here is that opinions vary as to what a reasonable
release interval is. The range of opinion seems to be anywhere from
"multiple times a day" to "every six months". It's difficult to
plan
this without getting consensus on this point, and it's difficult to
get consensus without first proving that we can get on top of the code
review backlog and stay on top of it. If we go with a longer cycle,
we can consider adopting a process similar to GNOME<ref>Example of
GNOME release timeline:
http://live.gnome.org/ThreePointOne</ref> or
Ubuntu or other project that has a good track record for sticking with
a regular releases. The most interesting practices there involve
having clear deadlines for proposing new features, deadlines for
features being done or pulled, and other date-risk mitigation
strategies.
As with the code review process last year, this year, we're probably
too reliant on Tim to not only drive but execute many steps. One way
we can speed up the process is to document it, making it clear where
we are in the process, and more importantly, how people can help.
"Help" can mean explicitly doing the work, but it can also be simply
"don't do things that delay the release further", or "stop others from
delaying the release". We have a wonderful [[Release checklist]], but
that list was too focused on the last steps before the release. Many
steps before the actual publication of the tarball were missing, so
they've been added into that document. More work can be done there.
Additionally, we will probably experiment with other team members
(e.g. Chad) performing at least alpha or beta releases.
During this release, we tagged many things "1.17" for backporting to
trunk. This process was useful, as long as people remember to untag
once they've merged. There was some confusion at various times who
was responsible for doing this work. It switched sometimes between
Roan, Chad, Tim and others. Additionally, pretty much everyone felt
empowered to tag things for backporting, but there probably wasn't
enough discipline in trimming that list back before actually making
the change. Some unreviewed changes were backported (or directly
applied) to the release branch, causing confusion and delay. We have
a policy about backporting
<ref>http://www.mediawiki.org/wiki/Commit_access_requests#Guidelines_for_applying_patches
- bullet points 4 & 5</ref>, but that policy wasn't followed very
closely.
The process of finding release notes that weren't added and then
backporting them was work that could have been done by people other
than Tim, but Tim ended up doing most of this. This is work that
needs to happen sooner in the process in a more distributed fashion.
Additionally, one way to avoid this extra work is to keep backporting
to a minimum in the first place.
This gets to the larger issue of communication and momentum at the end
of this process. With timezone differences, it's not sustainable to
have daily scrums all of the time, but having scrums during the last
couple of weeks or so in the process may help keep things moving to
the end.
== Recommendations ==
This section is intentionally left unfinished. The goal of this was
to establish and document what happened. To the extent anything is
incorrect or misleading above, corrections are encouraged.
Recommendations for new things to try based on lessons learned from
this release should be included below:
* ''your recommendation here''
...and possibly discussed on the talk page (suggestions above may be
ruthlessly edited; talk page is better for attribution and
preservation).
== References ==
<references/>