<quote name="Risker" date="2015-05-28" time="09:53:31 -0400">
This is strictly a question from an uninvolved observer. Does this schedule provide for sufficient time and real-time/hands-on testing before changes hit the big projects?
Yes. We still have Beta Cluster (production-like environment) which runs all code merged into master within 10 minutes of it being merged.
An IRC discussion I was following last evening suggested to me that the first deploy (to test wikis and mw.org) probably did not get sufficient hands-on testing/utilization to surface many issues that would be significant on production wikis, which means only 24 hours on smaller non-wikipedia wikis, hoping that any problems will pop up before it's applied to dewiki, frwiki and enwiki.
Honestly, that's the wrong perspective to take on that incident yesterday[0]. The issue is one that is hard to identify at low traffic levels (one that only really manifests itself at Wikipedia-scale with Wikipedia-scale caching). There will always be issues like this, unfortunately. The way to mitigate them better is by changing how we bucket requests to new or old versions of the software on production.
Currently we bucket by domain name/project site. This doesn't give us a lot of flexibility in testing new versions at scales that can show issues by not be "everyone". We would need to be able to deploy new versions based on percentage of overall requests (ie: 5% of all users to new version, then 10% of all users to new version, then everyone).
Best,
Greg
[0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20150527-Cookie