On Mon, Sep 20, 2010 at 10:09 PM, Rob Lanphier robla@wikimedia.org wrote:
This seems like a fine line of reasoning, though not one that I had thought was set in stone. For earlier releases, the MediaWiki releases benefited from deployment being pretty close to trunk, but presumably the reason why that drifted was because it became progressively harder for us to use our production environment as the de facto MediaWiki testbed.
The reason why that drifted is because our review system was already overloaded before Brion left, and completely collapsed after that, because we failed to decentralize review properly. Today the practice is roughly that most employees get their code reviewed and deployed quickly by other employees or even themselves; volunteers (and maybe some employees) get their code reviewed by (generally) Tim whenever he has time, which he doesn't have enough of, so their code never gets deployed, or only once in a blue moon. This is a terrible situation, and we need to fix it so that all committed code is being reviewed and deployed on a regular basis before we even consider a release, IMO.
I'm not sure what you mean by this. October 15 would be the branch point, not the release date. Are you saying that we have to release to production one month before even branching off of trunk?
Yes. There's such a huge deployment backlog that even after careful review, there's going to be a flurry of new problems that are quickly discovered and will have to be fixed. I don't think it makes sense to try backporting the inevitable flood of fixes to a separate branch. Instead, we should wait until deployment and trunk are relatively in sync again (we are aiming for that, right?) and then wait a while for things to stabilize before branching.
On Tue, Sep 21, 2010 at 12:26 PM, Rob Lanphier robla@robla.net wrote:
Doesn't this kinda depend on what our priorities are and what the priorities of people running MediaWiki are? There are many demands placed by Wikipedia that most websites don't have. In the rest of the software world, high traffic websites are the *last* ones to upgrade, not the first. Don't we want to get the benefit of other people using the software more heavily before we put it on Wikipedia?
No, because other people are in a much worse position to track down bugs. MediaWiki developers are mostly heavy Wikimedia users, and Wikimedia users are much more likely to know about Bugzilla and know where they can complain about problems. Moreover, Wikimedia employs (practically?) all paid MediaWiki developers. If a third-party site has a bunch of serious problems, its sysadmins will probably throw up their hands and revert to an earlier version; if Wikimedia has a problem, it's likely that it can be fixed in minutes by its employees.
Incremental deployment is a much better overall development strategy. Back in the days when we had scaps every week or two, as soon as a user reported a problem, we'd sometimes all say "Oh, I remember the commit that must have caused that." I remember one time when a user reported a problem in #wikimedia-tech, and Brion and I had a commit conflict due to committing the exact same fix at the same time in (I think) something like two minutes or less -- both of us remembered the commit that touched that problem (me because I had committed it, he because he reviewed it) and the problem was obvious given the bug report. This was standard; whoever did the scap would make sure to hang around for a few hours in #wikimedia-tech to fix any problems, and savvy users who watched that channel would see the scap and know to report any regressions there immediately. There'd be only a handful, so all of them could be fixed quickly.
Even if we didn't remember the exact commit, we'd have very few changes to look at in the log for the relevant files before we found the issue. At worst, we could almost certainly just revert the problematic commits with no conflicts. When you have months of old code being deployed at once, you're going to have tons of problems crop up all at once, instead of a few at a time, and they'll be harder to fix -- you won't remember what could have caused them, you'll have to look over more commits to find the problem, and when you do, you probably can't easily revert them.
Trying to use third-party sites to test the code before we deploy it isn't feasible. First of all, few of them will test it and fewer still will report bugs, and that will only get worse if we release less-tested code. Second of all, Wikimedia will run into problems that other sites won't, and then all the problems I discuss above are inevitable.
I think the correct course of action is to revamp our review structure so that we can return to the status quo ante of keeping deployment roughly in sync with trunk. We should aim for all commits should be reviewed for deployment less than a week after being committed -- perhaps just immediately reverted if they're badly flawed, but still reviewed.
Indeed, contrary to what you suggest, high-traffic websites are usually the first and only users to deploy the software *that they develop*. Most such software is in fact secret, so no one else can use it even if they wanted to. When it is open-source, the vendor's site is usually the first to upgrade, in my experience. vbulletin.com/forums/ runs alphas of vBulletin before they're released to customers, for example. I'd be interested to know if sites like drupal.org or phpbb.org use anything but cutting-edge versions of their own software -- I'd bet most of them deploy betas or release candidates, at the very least.
I realize that this isn't how it's traditionally been done, but then again, I think our tradition has drifted. Once upon a time, trunk was very regularly deployed in production. Providing releases was merely an alternative to telling MediaWiki admins "just go checkout trunk; that's what we're using". Now that we're a lot more cautious about what we put into production, we should question whether we still need to be even more cautious about what we release as MediaWiki.
I wouldn't say we're more cautious about what we put into production. I'd say it's more like some people get their stuff put into production, and others don't. As far as I can tell, the difference is mostly whether they're paid by Wikimedia. What employees have their code waiting in trunk for months without deployment? What volunteers have their code put into production on any kind of regular basis? I expect a few of the former exist, but a minority of employees; and I don't think the latter category exists at all. Correct me if I'm wrong, please -- I never followed the deployment branch closely. It includes none or almost none of my changes, so I never saw a reason to.
On Tue, Sep 21, 2010 at 1:48 PM, Guillaume Paumier gpaumier@wikimedia.org wrote:
I can see a number of reasons to have a stable trunk (also used by Wikimedia websites), that contains reviewed & tested code, along with a development branch that /can/ be broken:
- Developers wouldn't be afraid to commit unfinished work to the
development branch fearing they're going to break trunk.
- Tarballs for non-Wikimedia MediaWiki users would be more stable.
- Updates to Wikimedia sites would happen more often.
- Getting to a release would be easier, since it would be the result of
many incremental changes already merged into the stable trunk.
- Wikimedia users would probably not mind encountering small bugs &
quirks if it's the downside of benefiting from more regular code updates.
That said, I guess there are obvious drawbacks I'm not seeing.
The problem isn't the policy for committing to various places. The problem is review and deployment procedures.