What is suggested is having test site(s) which show
the actual content
from the main sites.
There's a couple possible ways to handle this:
1) Have a read-write copy that periodically repopulates its dataset from
a live site. Probably pretty safe.
2) Have a read-only configuration pulling live data from the live
database on the main server. Hopefully safe if no information-leakage
bugs, but less to test.
3) Have a read-write configuration using live data with alternate new
code. Potentially very unsafe.
For instance we could have copies of, say, English and German Wikipedia
that refresh the current-version data each week.
The question then is frequency of code updates.
You could have a system like the Debian folks have, with a progression from Unstable ->
Testing -> Stable for any software. (Except
retaining the current continuous integration approach, to prevent the huge gaps between
stable releases that have occurred in
Debian).
For the first line of defence, how about Option 2), with automated rollout of the latest
SVN whenever there have been no commits in
the last 2 hours? And *maybe* error_reporting set to E_ALL (just for this read-only test
site) with errors either echoed to the
browser or echoed onto #mediawiki (so that problems are easy to spot, and hopefully easy
to fix, as "given enough eyeballs, all bugs
are shallow").
That would have caught the original style thing; it also provides a safety valve so that
anything clearly malicious or dangerous can
be caught, and reverted within the 2 hours; once set up it hopefully requires minimal or
no manual intervention; it's relatively
safe; and by printing out warnings it makes any errors more obvious before review and
scap.
Then optionally, there could be Timwi's proposed beta-user site, in read-write mode,
with live data. Getting the software onto this
site would require a review, just as per currently. Then once the software has been used a
bit there, it could be rolled out onto
the cluster. This has the benefit that any major problems impact a smaller group of
people, and the people it does impact have
self-selected to be beta testers. Beta sites could be restricted to say the English and
German Wikipedias, to keep it manageable.
Essentially the flow of software at the moment I think looks something like this:
+-----+ +---------+ +----------+
| | review | test.wp | * copy | cluster, |
| SVN | -- and --> | r/w but | --- from --> | r/w real |
| | scap | no data | NFS | data |
+-----+ +---------+ +----------+
^ |
| /
--- fix created <----- probs found <------
What if it were something like this:
Unstable Testing/Beta Stable
+-----+ +---------+ +-----------+ +----------+
| | * 2 hrs | read- | review | Guinea | % no big | cluster, |
| SVN | -- w/ no --> | only WP | -- & --> | Pig r/w |-- probs -->| r/w real
|
| | change | mirror | scap | real data | found | data |
+-----+ +---------+ +-----------+ +----------+
^ | |
| V /
--- fix created <-- probs found <-------
* = no or very limited manual intervention required.
% = The trick here is to find a way to get the "no big probs found" rollout step
done without creating a lot of extra work, so as to
make it practical. The code has already been reviewed at this point, so the only question
is "have the beta testers reported any new
regressions?" - if the answer is yes, then you block until the answer is no, and if
the answer is no, then you roll out to the
cluster. There also needs to be enough time for problems to be found (e.g. 1 or 2 days),
and it has to be clear to the beta testers
how to report problems (e.g. do they log bugs / mail wikitech-l / post at village pump
technical / something else).
That sort of thing is why we abandoned that
development model and moved
to continuous integration, with smaller changes going live pretty
quickly and being tuned.
Continuous integration works; no reason to stop using it. The above just lets problems be
found sooner (by adding a smoke test
step), and with an impact on fewer people (by adding a beta-tester step).
All the best,
Nick.