Nick Jenkins wrote:
I really like
that idea. I would use such a beta instance and
report problems as I find them (which is the purpose of a
beta-test).
But from what Tim was saying, I think (and someone please correct me
if I'm wrong here) that the idea was to have a test wiki which would
be updated as one of the last steps before rolling a software update
out onto the cluster. The beta site would be updated, and then there
would be a quick smoke test by the person doing the roll out ("does
the front page look normal?", "do a few other pages look normal?"),
and if nothing abnormal was observed, then it would be rolled out
onto the cluster.
That's what
test.wikipedia.org is now; but of course it doesn't contain
any real content so it's boring. :)
The issue we've got here is with unexpected interactions with customized
CSS and JavaScript on each of several dozen large, active wikis (and
hundreds more smaller ones). That's a lot tougher to smoke-test because
the custom styles and the funny layouts aren't *on* the test site.
What is suggested is having test site(s) which show the actual content
from the main sites.
There's a couple possible ways to handle this:
1) Have a read-write copy that periodically repopulates its dataset from
a live site. Probably pretty safe.
2) Have a read-only configuration pulling live data from the live
database on the main server. Hopefully safe if no information-leakage
bugs, but less to test.
3) Have a read-write configuration using live data with alternate new
code. Potentially very unsafe.
For instance we could have copies of, say, English and German Wikipedia
that refresh the current-version data each week.
The question then is frequency of code updates.
One reason we don't automatically update code is security and data
safety: sometimes new code includes XSS or SQL injection vectors, unsafe
code that could corrupt data, or simply inefficient code that could
pound the databases too hard. Waiting for at least some review to take
them live provides some additional safety (though certainly some such
problems don't get caught during that time).
Understandably we may be a bit reluctant to relax this rule if the
code's running on live data, or even on alternate data on the same machines.
If we had a long development cycle before deployment (something we've
tried to do before), then a beta period with duplicate sites would make
a lot of sense. I've seen other big sites like Slashdot do this; users
are asked to hit the copy site running the new software for a few days
and work out problems, then the test data vanishes when the real upgrade
happens.
test.wikipedia.org was originally populated with a subset of English
Wikipedia data to do exactly that.
We never really got the testing we needed when we tried that, though;
most problem reports didn't come until after the big upgrade happened --
and then we had to spend the next month chasing down bug after bug after
bug.
That sort of thing is why we abandoned that development model and moved
to continuous integration, with smaller changes going live pretty
quickly and being tuned.
Unfortunately these silly style-type issues are disproportionately
disruptive because of their visibility.
-- brion vibber (brion @
pobox.com)