Nick Jenkins wrote:
I really like that idea. I would use such a beta instance and report problems as I find them (which is the purpose of a beta-test).
But from what Tim was saying, I think (and someone please correct me if I'm wrong here) that the idea was to have a test wiki which would be updated as one of the last steps before rolling a software update out onto the cluster. The beta site would be updated, and then there would be a quick smoke test by the person doing the roll out ("does the front page look normal?", "do a few other pages look normal?"), and if nothing abnormal was observed, then it would be rolled out onto the cluster.
That's what test.wikipedia.org is now; but of course it doesn't contain any real content so it's boring. :)
The issue we've got here is with unexpected interactions with customized CSS and JavaScript on each of several dozen large, active wikis (and hundreds more smaller ones). That's a lot tougher to smoke-test because the custom styles and the funny layouts aren't *on* the test site.
What is suggested is having test site(s) which show the actual content from the main sites.
There's a couple possible ways to handle this:
1) Have a read-write copy that periodically repopulates its dataset from a live site. Probably pretty safe.
2) Have a read-only configuration pulling live data from the live database on the main server. Hopefully safe if no information-leakage bugs, but less to test.
3) Have a read-write configuration using live data with alternate new code. Potentially very unsafe.
For instance we could have copies of, say, English and German Wikipedia that refresh the current-version data each week.
The question then is frequency of code updates.
One reason we don't automatically update code is security and data safety: sometimes new code includes XSS or SQL injection vectors, unsafe code that could corrupt data, or simply inefficient code that could pound the databases too hard. Waiting for at least some review to take them live provides some additional safety (though certainly some such problems don't get caught during that time).
Understandably we may be a bit reluctant to relax this rule if the code's running on live data, or even on alternate data on the same machines.
If we had a long development cycle before deployment (something we've tried to do before), then a beta period with duplicate sites would make a lot of sense. I've seen other big sites like Slashdot do this; users are asked to hit the copy site running the new software for a few days and work out problems, then the test data vanishes when the real upgrade happens.
test.wikipedia.org was originally populated with a subset of English Wikipedia data to do exactly that.
We never really got the testing we needed when we tried that, though; most problem reports didn't come until after the big upgrade happened -- and then we had to spend the next month chasing down bug after bug after bug.
That sort of thing is why we abandoned that development model and moved to continuous integration, with smaller changes going live pretty quickly and being tuned.
Unfortunately these silly style-type issues are disproportionately disruptive because of their visibility.
-- brion vibber (brion @ pobox.com)