Hi Diederik,
Let me first thank you for taking what must've been a long time to clearly articulate your thoughts on Git/Gerrit. Having been a guinea pig for several weeks now, I think your input is highly valuable. Replies inline.
On Tue, Mar 6, 2012 at 2:20 PM, Diederik van Liere dvanliere@gmail.com wrote:
Hi all,
Some disclaimers before I start my thread:
- I am a big believer in Git and dvcs and I think this is the right decision
- I am a big believer in Gerrit and code-review and I think this is
the right decision
We have a convert \o/
- I might be wholly unaware / inaccurate of certain things, apologies
in advance.
Totally ok and totally understandable. Part of my job with this migration is to educate. I didn't understand a whole lot about git before this process started (beyond a simple clone/fetch/commit/push), but I've learned lots and I'm willing to share what I've learned.
- A BIIGG thankyou to all the folks involved in preparing this
migration (evaluation, migration and training): in particular Chad, Sumanah and Roan (but I am sure more people are involved and I am just blissfully unaware).
Let me pile on the thanks. Sumana's been a tremendous help with keeping me on top of documentation, communication, and generally not hiding in a bunker and doing this alone. Roan's been an awesome help and guinea pig, going above and beyond like he always does. Also a huge thanks to Ryan, who's been superb in getting this infrastructure up and helping me to understand it (and fix it when I break things).
My main worry is that we are not spending enough time on getting all engineers (both internal and in the community) up to speed with the coming migration to Git and Gerrit and that we are going to blame the tools (Gerrit and/or Git) instead of the complex interaction between three changes. We are making three fundamental changes in one-shot:
- Migrating from a centralized source control system to a
decentralized system (SVN -> Git) 2) Introducing a new dedicated code-review tool (Gerrit) 3) Introducing a gated-trunk model
These are big changes. They're drastic changes. They require a rethinking of a great many things that we do from both technical and non-technical perspectives. Unfortunately, I don't see how we could've done #1 without #2. CodeReview is not designed (and was never designed) to work with a DVCS. The workflow's just not there, and it would've basically required rewriting huge parts of it. Rather than reinvent the wheel (again), we went with Gerrit.
Arguably, we could've gone a straight push and skipped item #3. But given the continual code review backlog, and the desire to keep trunk stable (and hopefully deploy much more often), the decision to gate trunk was made pretty early on in the discussions.
My concern is not about the UI of Gerrit, I know it's popular within WMF to say that it's UI sucks but I don't think that's the case and even if it was an issue it's only minor. People have already suggested that we might consider other code-review systems, I did a quick Google search and we are the only community considering migrating from Gerrit to Phabricator. I think this is besides the point: the real challenge is moving to a gated-trunk model, regardless of the chosen code-review tool. I cannot imagine other code-review tools that are also based on a gated-trunk model and work with Git are much easier than Gerrit. The complexity comes from the gated-trunk model, not from the tool.
Agreed. Gerrit has a learning curve (see other e-mail), but I do think a lot of the current confusion comes back to the gated trunk model. It is a completely different workflow from what we've been doing for the past ~10 years, so it's bound to be confusing regardless of the tools used.
In an ideal world, our code-review backlog would be zero commits at any time of the day, if that's the case then 'master' is always up-to-date and you have the same situation as with the 'always-commit' model. However, we know that the code-review backlog is a fact and it's the intersection of Git, Gerrit and the backlog that is going to be painful.
I don't think we can make any assumptions about how the code review backlog is going to look in gerrit. The list in long because we've got no real rush to review--the code's in trunk and the impetus is on the reviewer to eventually review the code sometime before deployment (hopefully).
I think there will be some lag initially as we're getting our feet wet, but I believe it will improve with time.
Suppose I clone master, but there are 10 commits waiting to be reviewed with files that are relevant to me. I am happily coding in my own local branch and after a while ready to commit. Meanwhile, those 10 commits have been reviewed and merged and now when I want to merge my branch back to master I get merge conflicts. Either I discover these merge conflicts when my branch is merged back to master or if I pull mid-way to update my local branch.
I agree with what Platonides said regarding this.
To be a productive engineer after the migration it will *not* be sufficient if you have only mastered git clone, git pull, git push, git add and git commit commands. These are the basic git commands.
Two overall recommendations:
- The Git / Gerrit combination means that you will have to understand
git rebase, git commit --amend, git bisect and git cherry-pick. This is advanced Git usage and that will make the learning curve steeper. I think we need to spend more time on training, I have been looking for good tutorials about Git&Gerrit in practise and I haven't been able to find it but maybe other people have better Google Fu skills (I think we are looking for advanced tutorials, not just cloning and pulling, but also merging, bisect and cherrypick).
git-review helps lower some of these barriers since it automatically rebases against origin/* for you so you get a clean merge on push. Cherry picking's not that hard, and gerrit actually gives you the command from the UI to pull the specific patchset.
I've yet to need git bisect for pushing patchsets--what's the use case there?
- We need to come up with a smarter way determining how to approach
the code-review backlog. Three overall strategies come to mind: a) random, just pick a commit b) time-based picking (either the oldest or the youngest commit) c) 'impact' of commit
A combination of all 2 & 3--just as we do it now. I prefer that older commits get reviewed so they can be fixed before it becomes difficult. Generally speaking, this will remain very similar in git. However, if something is more or less important, they can be reviewed accordingly.
One of the most important habits I can encourage people to get into is using separate local branches for separate features/fixes/etc. If two commits aren't related--they should not be dependent on one another. It makes the review process more difficult when you've got unrelated dependencies since you have to review all of them to submit. This raises the barrier to getting things merged to master.
-Chad