[Engineering] [Ops] Changes to SWAT deployment policies, effective Monday April 30th

Tyler Cipriani tcipriani at wikimedia.org
Sat Apr 28 00:19:41 UTC 2018


On 18-04-27 21:00:35, Gergo Tisza wrote:
>On Fri, Apr 27, 2018 at 7:05 PM, Niharika Kohli <nkohli at wikimedia.org>
>wrote:
>
>> Also, I think dropping the limit to 4 patches per window is extreme,
>> especially if we are asking people to start splitting their patches now.
>> Very often we can +2 multiple patches in one go if they don't affect each
>> other, or sync out changes together if they happen to the same file. I've
>> deployed 8 patches in a window often, with people asking if they can add
>> more yet. Due to timezone limitations, most people can only attend one of
>> the SWAT windows and if they can't get it out in that window, they have to
>> wait a whole day or more to get it out.
>>
>
>FWIW, here are some stats on the patch counts for the first three months of
>2018 (might contain errors, tried not to spend too much time on it):
>
>EU mid: 8, 6, 2, 10, 1, 6, 1, 2, 1, 2, 2, 6, 2, 9, 6, 10, 2, 1, 1, 8, 7, 5,
>5, 5, 2, 7, 6, 8, 4, 7, 4, 4, 4, 1, 1, 4, 3, 3, 2, 4, 4, 5, 5, 1
>(average: 4.25, max of 2-week rolling average: 5.7)
>Morning: 7, 1, 1, 5, 1, 3, 4, 1, 6, 3, 4, 3, 3, 5, 6, 4, 3, 5, 8, 4, 5, 4,
>3, 0, 3, 1, 5, 9, 2, 1, 3
>(average: 3.6, max of 2-week rolling average: 5)
>Evening: 8, 1, 1, 1, 1, 3, 3, 2, 1, 3, 4, 2, 1, 1, 2, 1, 1, 5, 1, 6, 1, 3,
>3, 1, 2, 3, 1, 0, 9, 10, 5, 4, 1, 6, 2, 1, 2, 4, 3
>(average: 2.8, max of 2-week rolling average: 4.75)
>

These stats are really cool and they made me want to dig a little more. There 
have been a few times where having data about the actual syncs that make up a 
given SWAT window would be nice[0] (this being another one of those times).

As of now, to get information about the number of syncs that make up a 
given SWAT window -- or to see how long a SWAT window actually takes -- there 
is some digging in the SAL required (and even then it can be hard to 
figure out what happened if there is a window with patches, but no 
syncs, or just one sync, etc). Anyway, I spent some time digging in the 
SAL[1] to correlate SWAT windows on Wikitech to actual syncs and 
deployments.

One thing I found is that number of patches on Wikitech isn't necessarily the 
number of patches that make it out in a given window -- which makes sense -- 
sometimes we run out of time in the window or people don't show up or something 
breaks and we have to stop.

	2018-01-02 Europe:  8 patches  1:05 6/8
	2018-01-03 Evening: 8 patches  1:01 8/8
	2018-01-08 Europe:  8 patches  1:03 8/8
	2018-01-29 Europe:  9 patches  0:58 4/9
	2018-02-06 Europe:  10 patches 1:02 7/10
	2018-02-13 Europe:  8 patches  1:01 5/8
	2018-02-28 Europe:  8 patches  1:16 7/8

The other thing I found is that there was no SWAT window between 2018-01-02 and 
2018-03-09 with > 6 patches that we kept within the allotted 1 hour time limit 
and deployed all the patches (although we were very close a couple of times).

	2018-01-02 Europe:  8 patches  1:05 6/8
	2018-01-03 Evening: 8 patches  1:01
	2018-01-03 Morning: 7 patches  1:19
	2018-01-08 Europe:  8 patches  1:03
	2018-01-29 Europe:  9 patches  0:58 4/9
	2018-02-06 Europe:  10 patches 1:02 7/10
	2018-02-13 Europe:  8 patches  1:01 5/8
	2018-02-14 Europe:  7 patches  1:31 5/7
	2018-02-26 Europe:  7 patches  1:32
	2018-02-28 Europe:  8 patches  1:16 7/8
	2018-03-05 Europe:  7 patches  0:57 5/7

Looking at this info maybe 6 is the magic number?

FWIW, I feel like I struggle to get out 8 patches in an hour (depending on the 
patches).

Although maybe requiring more patches per change and allowing fewer patches in
a given window at the same time may not be the best course of action. As Chad
said elsewhere in the thread maybe we should focus on "changes" per window,
where "change" is the equivalent of a patch currently.

-- Tyler

[0]. <https://phabricator.wikimedia.org/T193311>
[1].  <https://gist.github.com/thcipriani/2d19626eb49f80cd33a949a318b830df>



More information about the Engineering mailing list