(Late reply: I was out the week this was sent, then another week of
On Wed, Jun 23, 2021 at 2:59 AM Jaime Crespo <jcrespo(a)wikimedia.org> wrote:
* How often are issues surfaced in the group0 ->
group1 vs group1 ->
group2, are there any stats to back the need for a change there?
The closest number I have to issues/group is the count of new "blocker"
tasks filed in phabricator per group.
Progressive rollout to each group gives us more confidence in the code
being deployed, so for each group we should see progressively fewer
🚂📈*Group vs blocker discovery over the past 153 trains**:*
- *Before group0*: 233
- *Group0*: 180
- *Group1*: 230
- *Group2*: 91
"Before group0" means that before we've rolled out the train to any wiki in
wikiprod, there's a blocker on the train task (just like today
1.37.0-wmf.14 is not deployed anywhere, but there's a blocker on the train
If we want each group to have progressively fewer blockers for each group
then the data shows that group0 is too small and/or group1 is too big.
There are other considerations. Deployers have a lot of work to do on
group0 day vs group1 day: so making group0 bigger/more useful for
developers makes the lives of deployers harder.
* Without changing the actual deploying days or the frequency, would there
be any benefit of shifting the deploy over multiple
weeks? (random example
Tu: group1->group2, (new branch) We: group0, Th: group0-> group1) or would
that make things worse?
I wonder what impact this change would have on blocker reports. For
instance, is it a function of the time left in the week that group2
surfaces relatively few blockers?
* You mention commons. I am guessing Commons, and
Wikidata, to some
extent- are both large sites with a lot of visibility but also very
different from the core features that are similar to most other wikis, but
the test version of those on group0 may not be enough to catch all issues.
Is there something that could be improved specifically for those sites?
This is a subset of a question I've been asking folks: why does the train
give us confidence? What does a train give us that a testing environment
like beta or a local environment can't give us? I think some of the magic
of train is the amount of traffic, but if that were the case then
artificial traffic should suffice. I think the other aspect of the train is
Hyrum's law—all observable properties of a system are hammered with
traffic: even observable properties that were not built intentionally.
* Can we do something to improve the speed from
"a user notices an issue
with the site" to "the right team/owner is aware of it and acts on it"?
Or can we do something to improve how many issues users notice? :)
Thanks for all of these great questions.