(Late reply: I was out the week this was sent, then another week of vacation happened)

On Wed, Jun 23, 2021 at 2:59 AM Jaime Crespo <jcrespo@wikimedia.org> wrote:
* How often are issues surfaced in the group0 -> group1 vs group1 -> group2, are there any stats to back the need for a change there?

The closest number I have to issues/group is the count of new "blocker" tasks filed in phabricator per group.

Progressive rollout to each group gives us more confidence in the code being deployed, so for each group we should see progressively fewer blockers.

๐Ÿš‚๐Ÿ“ˆGroup vs blocker discovery over the past 153 trains:
README_10_0.png
  • Before group0: 233
  • Group0: 180
  • Group1: 230
  • Group2: 91
"Before group0" means that before we've rolled out the train to any wiki in wikiprod, there's a blocker on the train task (just like today 1.37.0-wmf.14 is not deployed anywhere, but there's a blocker on the train task:ย https://phabricator.wikimedia.org/T281155 ).

If we want each group to have progressively fewer blockers for each group then the data shows that group0 is too small and/or group1 is too big. There are other considerations. Deployers have a lot of work to do on group0 day vs group1 day: so making group0 bigger/more useful for developers makes the lives of deployers harder.

* Without changing the actual deploying days or the frequency, would there be any benefit of shifting the deploy over multiple weeks? (random example Tu: group1->group2, (new branch) We: group0, Th: group0-> group1) or would that make things worse?

I wonder what impact this change would have on blocker reports. For instance, is it a function of the time left in the week that group2 surfaces relatively few blockers?
ย 
* You mention commons. I am guessing Commons, and Wikidata, to some extent- are both large sites with a lot of visibility but also very different from the core features that are similar to most other wikis, but the test version of those on group0 may not be enough to catch all issues. Is there something that could be improved specifically for those sites?

This is a subset of a question I've been asking folks: why does the train give us confidence? What does a train give us that a testing environment like beta or a local environment can't give us? I think some of the magic of train is the amount of traffic, but if that were the case then artificial traffic should suffice. I think the other aspect of the train is Hyrum's law[0]โ€”all observable properties of a system are hammered with traffic: even observable properties that were not built intentionally.
ย 
* Can we do something to improve the speed from "a user notices an issue with the site" to "the right team/owner is aware of it and acts on it"?

Or can we do something to improve how many issues users notice? :)

Thanks for all of these great questions.
โ€“ Tyler

ย [0]: <https://www.hyrumslaw.com/>