tl;dr: The new scap version is live in production. It has canary deploys.
Scap v.3.2.2-1 was deployed to production today. There are some new internal improvements as well as some that are user-facing.
The improvements you'll probably notice are:
* Tab completion works for scap subcommands(!) * Canary checks for MediaWiki deployments
Canary deployments:
1. Sync your change(s) to the api and appserver canary hosts 2. Wait (20 seconds) for traffic to hit those host 3. If there isn't a large increase in the error rate on those hosts (10x), release your changes to the remainder of the fleet.
If, for whatever reason, you find yourself in a position where you don't care about the error-rate change on canary nodes, use the --force flag, i.e.:
scap sync-file --force README 'Important README update'
Please report any problems in a phab ticket tagged with "Scap3" or in #wikimedia-releng in IRC on freenode.
<3, Your Hometown Release Engineers
Hi,
On 08/04/2016 01:41 PM, Tyler Cipriani wrote:
Canary deployments:
- Sync your change(s) to the api and appserver canary hosts
- Wait (20 seconds) for traffic to hit those host
- If there isn't a large increase in the error rate on those hosts (10x), release your changes to the remainder of the fleet.
Does scap/whatever make any requests against those hosts? Or is it just depending upon normal traffic to those hosts to possibly cause errors?
-- Legoktm
On 16-08-07 13:50:33, Legoktm wrote:
Does scap/whatever make any requests against those hosts? Or is it just depending upon normal traffic to those hosts to possibly cause errors?
The script that scap is using to query logstash is logstash_checker.py[0]. There are no requests being generated as part of the deployment process, scap relies wholly on normal traffic to spot errors.
There was some discussion on a couple phabricator tickets[1][2] about a pre-canary check step that would still be nice to implement.
While the canary check script was a good step, I still feel that a pre-canary deploy sanity check that consists of requests to known end-points on unpooled servers would be a boon to the prevention of catastrophic deploys.
-- Tyler
[0]. https://github.com/wikimedia/operations-puppet/blob/production/modules/service/files/logstash_checker.py [1]. https://phabricator.wikimedia.org/T136839 [2]. https://phabricator.wikimedia.org/T121597
wikitech-l@lists.wikimedia.org