2011/4/14 Mark Hershberger mhershberger@wikimedia.org:
Sorry, I should have been clearer. Yes, branch now(ish) and then aim for a 1.18 release on July 15th. My idea is that setting a date for the release to be soon and early would provide the motivation to the people involved in code review to keep it up-to-date.
The point I was trying to make was that July is by no means "soon and early" in my book. It's three months away, which is way to long. Setting a date is nice, but if we can get a release out before the set date, that's a good thing, and I think we can (and /should/) get 1.18 out way faster.
Roan Kattouw (Catrope)
On 15/04/11 04:22, Roan Kattouw wrote:
2011/4/14 Mark Hershberger mhershberger@wikimedia.org:
Sorry, I should have been clearer. Yes, branch now(ish) and then aim for a 1.18 release on July 15th. My idea is that setting a date for the release to be soon and early would provide the motivation to the people involved in code review to keep it up-to-date.
The point I was trying to make was that July is by no means "soon and early" in my book. It's three months away, which is way to long. Setting a date is nice, but if we can get a release out before the set date, that's a good thing, and I think we can (and /should/) get 1.18 out way faster.
My preference is for 2 to 3 major releases per year. We branched 1.17 in December and we're looking at doing a release in April. So a 4 month cycle would imply branching 1.18 in April and releasing in August.
I don't think having 4 or 5 major releases per year would serve anyone particularly well. A slower release cadence means:
* Less hassle for non-Wikimedia users, since upgrades between major releases require more work. Extensions break, patches break, DB upgrades need to be done.
* Less branches to backport to. This reduces the amount of work that needs to be done to backport security fixes and other bug fixes. We drop support for branches based on time elapsed, not number of versions released.
* Less branches to test against. If you're writing an extension that is meant to work on multiple MediaWiki versions, it will be easier if there are less versions that you need to test against, and potentially write special-case code for.
* It's easier to do major projects in trunk. When you merge work in to trunk from a development branch, it's necessary to stabilise the code before the next release. This can take a long time for a major project. Both the new installer and the resource loader benefited from a long release cycle in this way.
* More opportunity for whole-project review. When a project begins and ends in a single release cycle, reviewers can wait for the project to reach a state where the original developer is happy with it before they start reviewing and giving comments. This means that the reviewer doesn't have to spend so much time looking at intermediate commits.
-- Tim Starling
On 15 April 2011 06:53, Tim Starling tstarling@wikimedia.org wrote:
- Less hassle for non-Wikimedia users, since upgrades between major
releases require more work. Extensions break, patches break, DB upgrades need to be done.
People upgrade seldom. If we have one release per year it is likely that the code they upgrade to is already so old nobody remembers how it works.
- Less branches to backport to. This reduces the amount of work that
needs to be done to backport security fixes and other bug fixes. We drop support for branches based on time elapsed, not number of versions released.
I agree with this one, although I'm not the one who feels the pain here.
- Less branches to test against. If you're writing an extension that
is meant to work on multiple MediaWiki versions, it will be easier if there are less versions that you need to test against, and potentially write special-case code for.
On the other hand, with few releases far and between, I need to write lot of compatibility code in Translate extension to even support the latest stable release and trunk at the same time. Having branches for different releases for my extension sounds like a lot of effort to maintain them, not even speaking about supporting them.
But all of this is moot, since you're proposing 3 releases per year and I'm complaining about having only one or two releases per year. Three releases would be enough for me.
-Niklas
2011/4/15 Tim Starling tstarling@wikimedia.org:
My preference is for 2 to 3 major releases per year. We branched 1.17 in December and we're looking at doing a release in April. So a 4 month cycle would imply branching 1.18 in April and releasing in August.
I don't think having 4 or 5 major releases per year would serve anyone particularly well. A slower release cadence means:
I can get on board with having 3 releases per year, but I'll reiterate that 3 months, let alone 4, between branching and releasing is too long. Yes, 1.17 took 4 months to stabilize, but it was 10 months' worth of code, so that's a 1:2.5 ratio. Interpolating that suggests that a release with 4 months' worth of code can be prepared in less than 2 months, and I think that once code review is organized properly such that large backlogs don't happen anymore (we had a very large backlog for 1.17 and I think we'll have a comparable one, considering the difference in elapsed time, for 1.18, but I'd really like to have this organized properly for 1.19 or 1.20), we can do better than that.
Instead, you're proposing a 1:1 workflow where, at any given point in time, we always have a release branch that's being stabilized, which means we have to perpetually maintain three branches (trunk, deployment, release) instead of two, and are always in the process of preparing a release. I don't like that idea, and I think it's unnecessary.
Roan Kattouw (Catrope)
On 15/04/11 19:26, Roan Kattouw wrote:
2011/4/15 Tim Starling tstarling@wikimedia.org:
My preference is for 2 to 3 major releases per year. We branched 1.17 in December and we're looking at doing a release in April. So a 4 month cycle would imply branching 1.18 in April and releasing in August.
I don't think having 4 or 5 major releases per year would serve anyone particularly well. A slower release cadence means:
I can get on board with having 3 releases per year, but I'll reiterate that 3 months, let alone 4, between branching and releasing is too long. Yes, 1.17 took 4 months to stabilize, but it was 10 months' worth of code, so that's a 1:2.5 ratio. Interpolating that suggests that a release with 4 months' worth of code can be prepared in less than 2 months, and I think that once code review is organized properly such that large backlogs don't happen anymore (we had a very large backlog for 1.17 and I think we'll have a comparable one, considering the difference in elapsed time, for 1.18, but I'd really like to have this organized properly for 1.19 or 1.20), we can do better than that.
Instead, you're proposing a 1:1 workflow where, at any given point in time, we always have a release branch that's being stabilized, which means we have to perpetually maintain three branches (trunk, deployment, release) instead of two, and are always in the process of preparing a release. I don't like that idea, and I think it's unnecessary.
That's a fair point. I didn't mean to propose a 1:1 workflow, I meant to just make a point about release schedules.
I know that different developers have different ideas about branch point schedules and how they should relate to release schedules. I don't have a strong view at this stage.
-- Tim Starling
2011/4/15 Tim Starling tstarling@wikimedia.org:
That's a fair point. I didn't mean to propose a 1:1 workflow, I meant to just make a point about release schedules.
OK. If your main point was to say that we should branch 1.18 in April and 1.19 in August, I'm cool with that. Releasing too infrequently is bad, as we saw with 1.17, but you make solid points to support the notion that releasing too frequently introduces problems of its own, and that we should find middle ground. Speaking in terms of release cycle length, I think that 4-6 months (2-3 releases/yr) is a bit long and 3-4 months (3-4 releases/yr) is better, but I'm sure we can work out a number. Your point that release cycle length should be consciously and carefully decided on is a very good one, and I'm sorry I hijacked it with my release latency argument.
Roan Kattouw (Catrope)
On Fri, Apr 15, 2011 at 2:26 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
2011/4/15 Tim Starling tstarling@wikimedia.org:
My preference is for 2 to 3 major releases per year. We branched 1.17 in December and we're looking at doing a release in April. So a 4 month cycle would imply branching 1.18 in April and releasing in August.
I don't think having 4 or 5 major releases per year would serve anyone particularly well. A slower release cadence means:
I can get on board with having 3 releases per year, but I'll reiterate that 3 months, let alone 4, between branching and releasing is too long.
I'd be happy with about two weeks: push 'beta' tarballs in the first week, 'release candidates' in the second week.
In the meantime, we should be running 1.18 on live servers, with a maximum of a week lag from trunk, and preferably much less. Ongoing work on trunk should always be keeping stability in mind, and code review should concentrate on ensuring that code is being actively tested and used.
I know we had some delays due to wanting to finish the security fixes, but I'm extremely concerned that trunk hasn't been being maintained this way since the initial 1.17 push.
Unexercised code is dangerous code that will break when you least expect it; we need to get code into use fast, where it won't sit idle until we push it live with a thousand other things we've forgotten about.
-- brion
2011/4/15 Brion Vibber brion@pobox.com:
I'd be happy with about two weeks: push 'beta' tarballs in the first week, 'release candidates' in the second week.
In the meantime, we should be running 1.18 on live servers, with a maximum of a week lag from trunk, and preferably much less. Ongoing work on trunk should always be keeping stability in mind, and code review should concentrate on ensuring that code is being actively tested and used.
Amen to this, the rest of your post, and your previous post (release 1.17 ASAP). You're formulating my opinions better than I could; cheesy-sounding but true :P
Roan Kattouw (Catrope)
On 04/15/2011 12:07 PM, Brion Vibber wrote:
Unexercised code is dangerous code that will break when you least expect it; we need to get code into use fast, where it won't sit idle until we push it live with a thousand other things we've forgotten about.
Translate wiki deserves major props for running a real world wiki on trunk. Its hard to count all the bugs get caught that way. Maybe once the heterogeneous deployment situation gets figured out we could do something similar with a particular project...
peace, --michael
Hoi, It makes sense for translatewiki.net to run on trunk. This way we are exposed to the latest messages and get as much localisation done before code actually hits production servers. Running another project just because it will run trunk only makes sense when it running trunk has added value.
What you can do is adopt translatewiki.net as your barometer for code quality and help it run as smoothly as possible. Thanks, GerardM
On 15 April 2011 19:36, Michael Dale mdale@wikimedia.org wrote:
On 04/15/2011 12:07 PM, Brion Vibber wrote:
Unexercised code is dangerous code that will break when you least expect
it;
we need to get code into use fast, where it won't sit idle until we push
it
live with a thousand other things we've forgotten about.
Translate wiki deserves major props for running a real world wiki on trunk. Its hard to count all the bugs get caught that way. Maybe once the heterogeneous deployment situation gets figured out we could do something similar with a particular project...
peace, --michael
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 15, 2011 at 12:10 PM, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
Hoi, It makes sense for translatewiki.net to run on trunk. This way we are exposed to the latest messages and get as much localisation done before code actually hits production servers. Running another project just because it will run trunk only makes sense when it running trunk has added value.
What you can do is adopt translatewiki.net as your barometer for code quality and help it run as smoothly as possible.
translatewiki.net is a great help, but don't forget that it doesn't run all the same extensions as are used in Wikimedia production sites. Regressions affecting things like CentralAuth can and do strike with very little warning; we've had several in the last few weeks that are only being caught because I have it set up on my workstation's dev instance and I see the breakages while I'm testing unrelated things.
It's important to actually be exercising the same code and the same configurations that are running in production. And when some bugs still don't get caught during that testing, it helps *a lot* to have only a minimal change set to look at since your last deployment. Changes can be rolled back more easily, and the problems found and fixed and redeployed more easily.
-- brion
On 16/04/11 03:07, Brion Vibber wrote:
In the meantime, we should be running 1.18 on live servers, with a maximum of a week lag from trunk, and preferably much less. Ongoing work on trunk should always be keeping stability in mind, and code review should concentrate on ensuring that code is being actively tested and used.
Yeah, I've heard this before. It didn't work the first time around, and I don't think it can work now. We can't use Wikipedia as a testing site for alpha-quality code anymore.
I think we should have a cycle of:
* Development branch merges and other major work in trunk. * Review and stabilisation of the course of a couple of months, alongside general development work. * Branch point. * A period of backports and review to ensure the stability of the new branch. * Testing, for 1-2 weeks. * Deployment.
This is what we did for 1.17, and it worked well, leading to a 1.17 deployment which caused a minimum of disruption.
Unexercised code is dangerous code that will break when you least expect it; we need to get code into use fast, where it won't sit idle until we push it live with a thousand other things we've forgotten about.
This certainly wasn't my experience with the 1.17 deployment. We had a great deal of review and testing of the 1.17 branch, and many bugs were fixed without having to get Wikipedians to tell us about them.
translatewiki.net is a great help, but don't forget that it doesn't run all the same extensions as are used in Wikimedia production sites.
No it doesn't, that's why we set up public test wikis which did have a similar set of extensions: first a set of wikis separate from the main cluster on prototype.wikimedia.org, and then a test wiki which was part of the cluster. Then we did a staged deployment, deploying 1.17 to several wikis at a time.
CT and Robla were very supportive of this deployment strategy, and setting up permanent systems for deploying different versions to different wikis is now a high priority project.
We had a significant amount of manpower dedicated to testing the software on prototype.wikimedia.org, both Wikimedia staff and experts contracted via Calcey QA.
It's not the same site as it was when you first proposed this policy.
-- Tim Starling
On Fri, Apr 15, 2011 at 5:06 PM, Tim Starling tstarling@wikimedia.orgwrote:
This is what we did for 1.17, and it worked well, leading to a 1.17 deployment which caused a minimum of disruption.
Unexercised code is dangerous code that will break when you least expect
it;
we need to get code into use fast, where it won't sit idle until we push
it
live with a thousand other things we've forgotten about.
This certainly wasn't my experience with the 1.17 deployment. We had a great deal of review and testing of the 1.17 branch, and many bugs were fixed without having to get Wikipedians to tell us about them.
*nod*
I think I've oversimplified with the 'deploy more often' part of things; lemme try to reorganize my arguments into something hopefully more cogent.
**tl;dr summary: More frequent testing of smaller pieces of changed code among small, but real sets of users should be a useful component of getting things tested and deployed faster and more safely. Judicious testing and deployment should help support a safer, but still much more aggressive, overall update frequency.**
It's certainly a fact that many bugs were found and fixed before deployment -- the organization around testing and bugfixing was in many ways FAR FAR superior to any deployment we've done before, and I don't mean to take away from that.
But it's also true that there were other bugs buried in code that had been changed 8-9 months previously, making it harder to track them down -- and much more difficult to revert them if a fix wasn't obvious. I certainly experienced that during the 1.17 deployment, and received the same impression from other developers at the time.
There was also a production outage for a time due to the choice not to initially do a staged rollout. This lesson has been learned, so should not be an issue in future deployments.
translatewiki.net is a great help, but don't forget that it doesn't run all
the same extensions as are used in Wikimedia production sites.
No it doesn't, that's why we set up public test wikis which did have a similar set of extensions: first a set of wikis separate from the main cluster on prototype.wikimedia.org, and then a test wiki which was part of the cluster.
Indeed, that is a very useful component of an ongoing development+deployment strategy. But lack of real traffic and real usage makes this only a limited part of testing. I also experienced that some of the prototype sites were broken for days or weeks (CentralAuth configuration problems was my impression?), which prevented me from being able to confirm some bugs reported against prototype sites at the time.
One thing that can help with this is to run more actual, but lower traffic, sites on the prototype infrastructure so people are really dogfooding them: a broken prototype site should be something requiring an immediate fix.
For instance us programmers probably use www.mediawiki.org a lot more aggressively than regular people do, *and* we have access to the code and some have access to the server infrastructure. It might be an ideal candidate for receiving more frequent updates from trunk.
Then we did a staged deployment, deploying 1.17 to several wikis at a time.
This was one of my recommendations for the 1.17 deployment, so yes that's exactly the sort of thing I'm advocating.
It was initially rejected because the old heterogeneous deploy scripts were out of date and it was worried that they wouldn't get done in time and might just break things worse. They then got reimplemented in a hurry when it turned out that yes, indeed, 1.17 broke when simply applied to the entire cluster at once -- reimplementing it was definitely the right choice and it significantly smoothed out the deployment once it happened.
It's not the same site as it was when you first proposed this policy.
It's a bigger site with more users, increasing the danger that small changes will cause unexpected breakages. I believe that smaller change sets that get more directly tested will help to reduce that danger.
Major sites like Google and Facebook are much more aggressive about A/B testing and progressive rollouts than we've ever been -- not in place of all other forms of testing and review, but definitely in addition. We have relatively limited resources, but we're not just three guys with an rsync script anymore... I think we can do better with what we've got.
I think this is a situation that will benefit from more aggressive testing, including more live & A/B testing: fine-grained rollouts mean fine-grained testing and fine-grained debugging. Not always perfect, but if problems get exposed and fixed quicker, in a relatively small audience but still big enough to drive real usage behavior, I think that's a win.
I do agree that just slapping trunk onto *.wikipedia.org every couple days isn't a great idea at this stage, but I think we can find an intermediate level that gets code into real, live usage on an ongoing rolling basis. Some things that may help:
* Full heterogenous deployment system so real but lower-traffic sites can be regularly run on more aggressive update schedules than high-traffic sites * Targeting specific experimental code to specific sites (production prototypes?) * Being able to better separate fixed backend and more experimental frontend code for a/b testing * Cleaner separation of modules: we shouldn't have to update CentralAuth to update ProofreadPage on Wikisource.
One issue we see at present is that since we version and deploy core and extensions together, it's tough to get a semi-experimental extension into limited deployment with regular updates. Let's make sure that's clean and easy to do to; right now it's very easy to deploy experimental JavaScript into a gadget or site JS, but an extension may just sit idle in SVN for years, unusable in production even if it's limited, modular code because no one wants to deploy it. If there's interest it may get a prototype site, but if they only get used by the testing crew or when we ask someone to go and make some fake edits on them, they're not going to have all their bugs exercised.
Being able to do more self-directed prototype sites with the upcoming virtualization infrastructure should help with that, and for certain front-end things it should be possible to use JS whatsits to hook some of that code into live sites for opt-in or a/b testing -- further reducing dangers by removing the server-side variations and providing an instant switch-back to the old code.
I don't advocate just blindly updating the whole stack all the time; I advocate aiming for smaller pieces that can be run and tested more easily and more safely in more flexible ways.
As a power user willing to risk my neck to make things better, I want to be able to opt in to the "Wikipedia beta" and actually get an experimental new feature *on Wikipedia or Commons* a lot more often. As a developer, I want to be able to get things into other peoples' hands so they can test them for me and give me feedback.
This is one of the reasons I'm excited about the future of Gadgets -- the JS+CSS side has always been the free-for-all where experimental tools can actually be created and tested and used in a real environment, while MediaWiki's PHP side has remained difficult to update in pieces. It's easier to deploy those things, and should get even easier and more powerful with time.
We should consider what we can do to make the PHP side smoother and easier as well, though obviously we are much more limited for security and functional safety reasons.
-- brion
Στις 15-04-2011, ημέρα Παρ, και ώρα 18:41 -0700, ο/η Brion Vibber έγραψε:
One issue we see at present is that since we version and deploy core and extensions together, it's tough to get a semi-experimental extension into limited deployment with regular updates. Let's make sure that's clean and easy to do to; right now it's very easy to deploy experimental JavaScript into a gadget or site JS, but an extension may just sit idle in SVN for years, unusable in production even if it's limited, modular code because no one wants to deploy it. If there's interest it may get a prototype site, but if they only get used by the testing crew or when we ask someone to go and make some fake edits on them, they're not going to have all their bugs exercised.
Being able to do more self-directed prototype sites with the upcoming virtualization infrastructure should help with that, and for certain front-end things it should be possible to use JS whatsits to hook some of that code into live sites for opt-in or a/b testing -- further reducing dangers by removing the server-side variations and providing an instant switch-back to the old code.
I don't advocate just blindly updating the whole stack all the time; I advocate aiming for smaller pieces that can be run and tested more easily and more safely in more flexible ways.
As a power user willing to risk my neck to make things better, I want to be able to opt in to the "Wikipedia beta" and actually get an experimental new feature *on Wikipedia or Commons* a lot more often. As a developer, I want to be able to get things into other peoples' hands so they can test them for me and give me feedback.
The ability to easily test a feature, an extension or an update on a small percentage of users, based on opt-in, project/language or simple random percentage, is something that many shops have and that we should prioritize adding to our deployment toolkit. It's unrealistic to think that we will uncover all of the issues, even the serious ones, ourselves running on a test environment, or even on the cluster. Our users exercise this code in ways that aren't even on our radar, which is a good thing; let's make use of it.
FWIW I also support having a much more agressive testing, deployment and release schedule, for many of the reasons already described by others.
Ariel
Assuming that there are no destructive bugs in the reviewed code, we could have en.alpha.wikipedia.org urls.
Our tests also need to be improved, so that we don't keep hitting the same boulders.
On Fri, Apr 15, 2011 at 9:41 PM, Brion Vibber brion@pobox.com wrote:
But it's also true that there were other bugs buried in code that had been changed 8-9 months previously, making it harder to track them down -- and much more difficult to revert them if a fix wasn't obvious. I certainly experienced that during the 1.17 deployment, and received the same impression from other developers at the time.
For a concrete example, see http://www.mediawiki.org/wiki/Special:Code/MediaWiki/83544 and follow-up commits. I made that commit shortly after 1.17 deployment, working with Roan to resolve a bug in my categorylinks rewrite. It turned out that I got confused by my own variable naming and completely broke non-uppercase collations in the process of fixing the bug that was visible on Wikimedia. That required effort by translatewiki.net to track down the bug again more than a month later. I'm pretty sure I wouldn't have made that mistake if I had been writing the fix two weeks later instead of six months later.
wikitech-l@lists.wikimedia.org