On Fri, Apr 15, 2011 at 5:06 PM, Tim Starling tstarling@wikimedia.orgwrote:
This is what we did for 1.17, and it worked well, leading to a 1.17 deployment which caused a minimum of disruption.
Unexercised code is dangerous code that will break when you least expect
it;
we need to get code into use fast, where it won't sit idle until we push
it
live with a thousand other things we've forgotten about.
This certainly wasn't my experience with the 1.17 deployment. We had a great deal of review and testing of the 1.17 branch, and many bugs were fixed without having to get Wikipedians to tell us about them.
*nod*
I think I've oversimplified with the 'deploy more often' part of things; lemme try to reorganize my arguments into something hopefully more cogent.
**tl;dr summary: More frequent testing of smaller pieces of changed code among small, but real sets of users should be a useful component of getting things tested and deployed faster and more safely. Judicious testing and deployment should help support a safer, but still much more aggressive, overall update frequency.**
It's certainly a fact that many bugs were found and fixed before deployment -- the organization around testing and bugfixing was in many ways FAR FAR superior to any deployment we've done before, and I don't mean to take away from that.
But it's also true that there were other bugs buried in code that had been changed 8-9 months previously, making it harder to track them down -- and much more difficult to revert them if a fix wasn't obvious. I certainly experienced that during the 1.17 deployment, and received the same impression from other developers at the time.
There was also a production outage for a time due to the choice not to initially do a staged rollout. This lesson has been learned, so should not be an issue in future deployments.
translatewiki.net is a great help, but don't forget that it doesn't run all
the same extensions as are used in Wikimedia production sites.
No it doesn't, that's why we set up public test wikis which did have a similar set of extensions: first a set of wikis separate from the main cluster on prototype.wikimedia.org, and then a test wiki which was part of the cluster.
Indeed, that is a very useful component of an ongoing development+deployment strategy. But lack of real traffic and real usage makes this only a limited part of testing. I also experienced that some of the prototype sites were broken for days or weeks (CentralAuth configuration problems was my impression?), which prevented me from being able to confirm some bugs reported against prototype sites at the time.
One thing that can help with this is to run more actual, but lower traffic, sites on the prototype infrastructure so people are really dogfooding them: a broken prototype site should be something requiring an immediate fix.
For instance us programmers probably use www.mediawiki.org a lot more aggressively than regular people do, *and* we have access to the code and some have access to the server infrastructure. It might be an ideal candidate for receiving more frequent updates from trunk.
Then we did a staged deployment, deploying 1.17 to several wikis at a time.
This was one of my recommendations for the 1.17 deployment, so yes that's exactly the sort of thing I'm advocating.
It was initially rejected because the old heterogeneous deploy scripts were out of date and it was worried that they wouldn't get done in time and might just break things worse. They then got reimplemented in a hurry when it turned out that yes, indeed, 1.17 broke when simply applied to the entire cluster at once -- reimplementing it was definitely the right choice and it significantly smoothed out the deployment once it happened.
It's not the same site as it was when you first proposed this policy.
It's a bigger site with more users, increasing the danger that small changes will cause unexpected breakages. I believe that smaller change sets that get more directly tested will help to reduce that danger.
Major sites like Google and Facebook are much more aggressive about A/B testing and progressive rollouts than we've ever been -- not in place of all other forms of testing and review, but definitely in addition. We have relatively limited resources, but we're not just three guys with an rsync script anymore... I think we can do better with what we've got.
I think this is a situation that will benefit from more aggressive testing, including more live & A/B testing: fine-grained rollouts mean fine-grained testing and fine-grained debugging. Not always perfect, but if problems get exposed and fixed quicker, in a relatively small audience but still big enough to drive real usage behavior, I think that's a win.
I do agree that just slapping trunk onto *.wikipedia.org every couple days isn't a great idea at this stage, but I think we can find an intermediate level that gets code into real, live usage on an ongoing rolling basis. Some things that may help:
* Full heterogenous deployment system so real but lower-traffic sites can be regularly run on more aggressive update schedules than high-traffic sites * Targeting specific experimental code to specific sites (production prototypes?) * Being able to better separate fixed backend and more experimental frontend code for a/b testing * Cleaner separation of modules: we shouldn't have to update CentralAuth to update ProofreadPage on Wikisource.
One issue we see at present is that since we version and deploy core and extensions together, it's tough to get a semi-experimental extension into limited deployment with regular updates. Let's make sure that's clean and easy to do to; right now it's very easy to deploy experimental JavaScript into a gadget or site JS, but an extension may just sit idle in SVN for years, unusable in production even if it's limited, modular code because no one wants to deploy it. If there's interest it may get a prototype site, but if they only get used by the testing crew or when we ask someone to go and make some fake edits on them, they're not going to have all their bugs exercised.
Being able to do more self-directed prototype sites with the upcoming virtualization infrastructure should help with that, and for certain front-end things it should be possible to use JS whatsits to hook some of that code into live sites for opt-in or a/b testing -- further reducing dangers by removing the server-side variations and providing an instant switch-back to the old code.
I don't advocate just blindly updating the whole stack all the time; I advocate aiming for smaller pieces that can be run and tested more easily and more safely in more flexible ways.
As a power user willing to risk my neck to make things better, I want to be able to opt in to the "Wikipedia beta" and actually get an experimental new feature *on Wikipedia or Commons* a lot more often. As a developer, I want to be able to get things into other peoples' hands so they can test them for me and give me feedback.
This is one of the reasons I'm excited about the future of Gadgets -- the JS+CSS side has always been the free-for-all where experimental tools can actually be created and tested and used in a real environment, while MediaWiki's PHP side has remained difficult to update in pieces. It's easier to deploy those things, and should get even easier and more powerful with time.
We should consider what we can do to make the PHP side smoother and easier as well, though obviously we are much more limited for security and functional safety reasons.
-- brion