On Fri, Apr 15, 2011 at 5:06 PM, Tim Starling <tstarling(a)wikimedia.org>wrote;wrote:
This is what we did for 1.17, and it worked well, leading to a 1.17
deployment which caused a minimum of disruption.
Unexercised code is dangerous code that will
break when you least expect
we need to get code into use fast, where it
won't sit idle until we push
live with a thousand other things we've
This certainly wasn't my experience with the 1.17 deployment. We had a
great deal of review and testing of the 1.17 branch, and many bugs
were fixed without having to get Wikipedians to tell us about them.
I think I've oversimplified with the 'deploy more often' part of things;
lemme try to reorganize my arguments into something hopefully more cogent.
**tl;dr summary: More frequent testing of smaller pieces of changed code
among small, but real sets of users should be a useful component of getting
things tested and deployed faster and more safely. Judicious testing and
deployment should help support a safer, but still much more aggressive,
overall update frequency.**
It's certainly a fact that many bugs were found and fixed before deployment
-- the organization around testing and bugfixing was in many ways FAR FAR
superior to any deployment we've done before, and I don't mean to take away
But it's also true that there were other bugs buried in code that had been
changed 8-9 months previously, making it harder to track them down -- and
much more difficult to revert them if a fix wasn't obvious. I certainly
experienced that during the 1.17 deployment, and received the same
impression from other developers at the time.
There was also a production outage for a time due to the choice not to
initially do a staged rollout. This lesson has been learned, so should not
be an issue in future deployments.
is a great help, but don't
forget that it doesn't run
the same extensions as are used in Wikimedia
No it doesn't, that's why we set up public test wikis which did have a
similar set of extensions: first a set of wikis separate from the main
cluster on prototype.wikimedia.org
, and then a test wiki which was
part of the cluster.
Indeed, that is a very useful component of an ongoing development+deployment
strategy. But lack of real traffic and real usage makes this only a limited
part of testing. I also experienced that some of the prototype sites were
broken for days or weeks (CentralAuth configuration problems was my
impression?), which prevented me from being able to confirm some bugs
reported against prototype sites at the time.
One thing that can help with this is to run more actual, but lower traffic,
sites on the prototype infrastructure so people are really dogfooding them:
a broken prototype site should be something requiring an immediate fix.
For instance us programmers probably use www.mediawiki.org
a lot more
aggressively than regular people do, *and* we have access to the code and
some have access to the server infrastructure. It might be an ideal
candidate for receiving more frequent updates from trunk.
Then we did a staged deployment, deploying 1.17
to several wikis at a time.
This was one of my recommendations for the 1.17 deployment, so yes that's
exactly the sort of thing I'm advocating.
It was initially rejected because the old heterogeneous deploy scripts were
out of date and it was worried that they wouldn't get done in time and might
just break things worse. They then got reimplemented in a hurry when it
turned out that yes, indeed, 1.17 broke when simply applied to the entire
cluster at once -- reimplementing it was definitely the right choice and it
significantly smoothed out the deployment once it happened.
It's not the same site as it was when you first proposed this policy.
It's a bigger site with more users, increasing the danger that small changes
will cause unexpected breakages. I believe that smaller change sets that get
more directly tested will help to reduce that danger.
Major sites like Google and Facebook are much more aggressive about A/B
testing and progressive rollouts than we've ever been -- not in place of all
other forms of testing and review, but definitely in addition. We have
relatively limited resources, but we're not just three guys with an rsync
script anymore... I think we can do better with what we've got.
I think this is a situation that will benefit from more aggressive testing,
including more live & A/B testing: fine-grained rollouts mean fine-grained
testing and fine-grained debugging. Not always perfect, but if problems get
exposed and fixed quicker, in a relatively small audience but still big
enough to drive real usage behavior, I think that's a win.
I do agree that just slapping trunk onto *.wikipedia.org every couple days
isn't a great idea at this stage, but I think we can find an intermediate
level that gets code into real, live usage on an ongoing rolling basis. Some
things that may help:
* Full heterogenous deployment system so real but lower-traffic sites can be
regularly run on more aggressive update schedules than high-traffic sites
* Targeting specific experimental code to specific sites (production
* Being able to better separate fixed backend and more experimental frontend
code for a/b testing
* Cleaner separation of modules: we shouldn't have to update CentralAuth to
update ProofreadPage on Wikisource.
One issue we see at present is that since we version and deploy core and
extensions together, it's tough to get a semi-experimental extension into
limited deployment with regular updates. Let's make sure that's clean and
into a gadget or site JS, but an extension may just sit idle in SVN for
years, unusable in production even if it's limited, modular code because no
one wants to deploy it. If there's interest it may get a prototype site, but
if they only get used by the testing crew or when we ask someone to go and
make some fake edits on them, they're not going to have all their bugs
Being able to do more self-directed prototype sites with the upcoming
virtualization infrastructure should help with that, and for certain
front-end things it should be possible to use JS whatsits to hook some of
that code into live sites for opt-in or a/b testing -- further reducing
dangers by removing the server-side variations and providing an instant
switch-back to the old code.
I don't advocate just blindly updating the whole stack all the time; I
advocate aiming for smaller pieces that can be run and tested more easily
and more safely in more flexible ways.
As a power user willing to risk my neck to make things better, I want to be
able to opt in to the "Wikipedia beta" and actually get an experimental new
feature *on Wikipedia or Commons* a lot more often. As a developer, I want
to be able to get things into other peoples' hands so they can test them for
me and give me feedback.
This is one of the reasons I'm excited about the future of Gadgets -- the
JS+CSS side has always been the free-for-all where experimental tools can
actually be created and tested and used in a real environment, while
MediaWiki's PHP side has remained difficult to update in pieces. It's easier
to deploy those things, and should get even easier and more powerful with
We should consider what we can do to make the PHP side smoother and easier
as well, though obviously we are much more limited for security and
functional safety reasons.