Hi all,
Some disclaimers before I start my thread:
1) I am a big believer in Git and dvcs and I think this is the right decision 2) I am a big believer in Gerrit and code-review and I think this is the right decision 3) I might be wholly unaware / inaccurate of certain things, apologies in advance. 4) A BIIGG thankyou to all the folks involved in preparing this migration (evaluation, migration and training): in particular Chad, Sumanah and Roan (but I am sure more people are involved and I am just blissfully unaware).
My main worry is that we are not spending enough time on getting all engineers (both internal and in the community) up to speed with the coming migration to Git and Gerrit and that we are going to blame the tools (Gerrit and/or Git) instead of the complex interaction between three changes. We are making three fundamental changes in one-shot: 1) Migrating from a centralized source control system to a decentralized system (SVN -> Git) 2) Introducing a new dedicated code-review tool (Gerrit) 3) Introducing a gated-trunk model
My concern is not about the UI of Gerrit, I know it's popular within WMF to say that it's UI sucks but I don't think that's the case and even if it was an issue it's only minor. People have already suggested that we might consider other code-review systems, I did a quick Google search and we are the only community considering migrating from Gerrit to Phabricator. I think this is besides the point: the real challenge is moving to a gated-trunk model, regardless of the chosen code-review tool. I cannot imagine other code-review tools that are also based on a gated-trunk model and work with Git are much easier than Gerrit. The complexity comes from the gated-trunk model, not from the tool.
The gated-trunk model means that, when you clone or pull from master, it might be the case that files relevant to you have been changed but that those new changes are waiting to be merged (the pull request backlog, AKA the code-review backlog). In the always-commit world with no gatekeeping between developers and master, this never happens; your local copy can always be fully synchronized with trunk ("master"). Even if a commit is reverted, then your local working copy will still have it, and any changes that you might have based on this reverted commit, you can still commit. Obviously people get annoyed when you keep checking in reverted code, but it won't break anything.
In an ideal world, our code-review backlog would be zero commits at any time of the day, if that's the case then 'master' is always up-to-date and you have the same situation as with the 'always-commit' model. However, we know that the code-review backlog is a fact and it's the intersection of Git, Gerrit and the backlog that is going to be painful.
Suppose I clone master, but there are 10 commits waiting to be reviewed with files that are relevant to me. I am happily coding in my own local branch and after a while ready to commit. Meanwhile, those 10 commits have been reviewed and merged and now when I want to merge my branch back to master I get merge conflicts. Either I discover these merge conflicts when my branch is merged back to master or if I pull mid-way to update my local branch.
To be a productive engineer after the migration it will *not* be sufficient if you have only mastered git clone, git pull, git push, git add and git commit commands. These are the basic git commands.
Two overall recommendations:
1) The Git / Gerrit combination means that you will have to understand git rebase, git commit --amend, git bisect and git cherry-pick. This is advanced Git usage and that will make the learning curve steeper. I think we need to spend more time on training, I have been looking for good tutorials about Git&Gerrit in practise and I haven't been able to find it but maybe other people have better Google Fu skills (I think we are looking for advanced tutorials, not just cloning and pulling, but also merging, bisect and cherrypick).
2) We need to come up with a smarter way determining how to approach the code-review backlog. Three overall strategies come to mind: a) random, just pick a commit b) time-based picking (either the oldest or the youngest commit) c) 'impact' of commit
a) and b) do not require anything but are less suited for a gated-trunk model. Option c) could be something where we construct a graph of the codebase and determine the most central files (hubs) and that commits are sorted by centrality in this graph. The graph only needs to be reconstructed after major refactoring or every month or so. Obviously, this requires a bit of coding and I don't have formal proof that this actually will reduce the pain but I am hopeful. If constructing a graph is too cumbersome then we can sort by the number of affected files in a commit as a proxy. If we cannot come up with a c) strategy then the only real option is to make sure that the queue is as Wikimedia short as possible.
Best, Diederik