What is suggested is having test site(s) which show the actual content from the main sites.
There's a couple possible ways to handle this:
- Have a read-write copy that periodically repopulates its dataset from
a live site. Probably pretty safe.
- Have a read-only configuration pulling live data from the live
database on the main server. Hopefully safe if no information-leakage bugs, but less to test.
- Have a read-write configuration using live data with alternate new
code. Potentially very unsafe.
For instance we could have copies of, say, English and German Wikipedia that refresh the current-version data each week.
The question then is frequency of code updates.
You could have a system like the Debian folks have, with a progression from Unstable -> Testing -> Stable for any software. (Except retaining the current continuous integration approach, to prevent the huge gaps between stable releases that have occurred in Debian).
For the first line of defence, how about Option 2), with automated rollout of the latest SVN whenever there have been no commits in the last 2 hours? And *maybe* error_reporting set to E_ALL (just for this read-only test site) with errors either echoed to the browser or echoed onto #mediawiki (so that problems are easy to spot, and hopefully easy to fix, as "given enough eyeballs, all bugs are shallow").
That would have caught the original style thing; it also provides a safety valve so that anything clearly malicious or dangerous can be caught, and reverted within the 2 hours; once set up it hopefully requires minimal or no manual intervention; it's relatively safe; and by printing out warnings it makes any errors more obvious before review and scap.
Then optionally, there could be Timwi's proposed beta-user site, in read-write mode, with live data. Getting the software onto this site would require a review, just as per currently. Then once the software has been used a bit there, it could be rolled out onto the cluster. This has the benefit that any major problems impact a smaller group of people, and the people it does impact have self-selected to be beta testers. Beta sites could be restricted to say the English and German Wikipedias, to keep it manageable.
Essentially the flow of software at the moment I think looks something like this:
+-----+ +---------+ +----------+ | | review | test.wp | * copy | cluster, | | SVN | -- and --> | r/w but | --- from --> | r/w real | | | scap | no data | NFS | data | +-----+ +---------+ +----------+ ^ | | / --- fix created <----- probs found <------
What if it were something like this:
Unstable Testing/Beta Stable +-----+ +---------+ +-----------+ +----------+ | | * 2 hrs | read- | review | Guinea | % no big | cluster, | | SVN | -- w/ no --> | only WP | -- & --> | Pig r/w |-- probs -->| r/w real | | | change | mirror | scap | real data | found | data | +-----+ +---------+ +-----------+ +----------+ ^ | | | V / --- fix created <-- probs found <-------
* = no or very limited manual intervention required.
% = The trick here is to find a way to get the "no big probs found" rollout step done without creating a lot of extra work, so as to make it practical. The code has already been reviewed at this point, so the only question is "have the beta testers reported any new regressions?" - if the answer is yes, then you block until the answer is no, and if the answer is no, then you roll out to the cluster. There also needs to be enough time for problems to be found (e.g. 1 or 2 days), and it has to be clear to the beta testers how to report problems (e.g. do they log bugs / mail wikitech-l / post at village pump technical / something else).
That sort of thing is why we abandoned that development model and moved to continuous integration, with smaller changes going live pretty quickly and being tuned.
Continuous integration works; no reason to stop using it. The above just lets problems be found sooner (by adding a smoke test step), and with an impact on fewer people (by adding a beta-tester step).
All the best, Nick.