On Jul 24, 2012, at 9:09 PM, Sumana Harihareswara wrote:
On 07/17/2012 08:41 PM, Rob Lanphier wrote:
It would appear from reading this page that the only alternative to Gerrit that has a serious following is GitHub. Is that the case?
There's some irony and yet so apropos in that now that Gerrit is finally stabilizing we're discussing alternatives. :-)
Oh well… Here's my 2c on GitHub…
In an "ignore reality" world, I suppose my personal choices would be 1) GitHub; 2) Phabricator; 3) everything else. But let's cross GitHub off that list (for WMF).
Maybe in some future when our development process more closely models a seat-of-the-pants startup universe of code first, break often, recover fast we could consider GitHub for hosting some of our public repositories, but since I don't see that happening anytime soon (ever?).
…
The nonstarter is that while we could host the public repositories, we do have a lot of non-public stuff in Gerrit right now. That stuff can't go into the cloud.
Well on to specifics…
But I have a lot of reservations about using GitHub as our primary source control and code review platform. There's the free-as-in-freedom issue, of course,
Personally I think this ship sailed the day we used Google Apps for e-mail. :-)
but I'm also concerned about flexibility, account management, fragmentation of community and duplication of tools, and their terms of service.
== Flexibility == I see GitHub as kind of like a Mac.
This trope is too facile. But I do agree with what you are alluding to which is while it's fine for some, that doesn't mean it's fine for us. Especially us in our current development process.
It has a nice UI for the use case that its creators envision. It's fine for personal use.
A great many very large open source projects are currently using or hosted at GitHub (including node, jQuery, and our Android/PhoneGap app ;-))
And if we try it, everything'll be great.... until we smack into an invisible brick wall. We'll want to work around one little thing, the way that we sneak around various issues in Gerrit, with hacks and searches and upgrades, if it's not in GitHub's web UI or API [3], we'll be stuck.
The API is simplistic but serviceable. However, the satellite tools that are built around it are either part of GitHub (their internal issue tracker, their own Ruby-based wiki Gollum, etc) or are mostly for commercial use/cloud-based. For instance, in tools that assist in deployment from a GitHub repositiory (even if that was feasible for us which it isn't), most seem to have a hidden assumption that these are Web 2.0 companies deploying on AWS… not to mention that usage of those tools clearly violates our policy and values.
Right now we have our primary Git repo on our own machines, which is the ultimate backdoor. The way we have been modifying our tools, automating certain kinds of commits (like with l10n-bot), troubleshooting by looking at our logfiles, and generally customizing things to suit our weird needs -- GitHub is closed source and won't let us do that. We are not the typical use case for GitHub. Since we have hundreds of extensions, each with their own repository, we would have way more repositories and members than almost any other organization on there. So, one example: arbitrary sortability of lists of repositories. We could mod Gerrit to do it, but not GitHub. How would we centralize and list the repositories so they're easy to browse, search, edit, follow, and watch them together? It looks like GitHub's less suitable for that, but I'd welcome examples of orgs that create their own sub-GitHub hubs.
Well GitHubs modality doesn't prevent operating on the git repository through the API. But I agree since where is the support on our end for doing/writing these when we already have something servicable in Gerrit?
== Accounts == By using GitHub, we would no longer be managing the user accounts. This would make single sign-on with other Wikimedia services (especially Labs) completely impossible.
Technically this integration could be done by them authorizing us to their accounts via OAuth2. It's not the same thing as what you're saying though… it's kind of the opposite of what you're saying. What you want is what GitHub Enterprise is for.
I mentioned above that GitHub seems more meant for single FLOSS projects than for confederations of related repositories. GitHub does not have the concept of "groups," so granting access to collections of repos would be a time-consuming process. GitHub does not support branch-level permissions, either (it encourages "forking" and then merging back to master), and that does not seem as suitable for long-term collaborative branches.
This isn't quite true. GitHub does have the concept of groups (you can create as many as you want and control access levels (read-only, read/write, admin) on a project-by-project and between projects). However you cannot do it as robustly as Gerrit does.
More troublesome I think is not that GitHubs's forking-merge model handles permission, but that GitHub's model is fundamentally a different modality than our gated trunk code review model. GitHub effectively allows self-review because there is no concept of review.
On the other hand, since this is handled through a Pull Request instead of a Gerrit ChangeId, it does mean the history of the code commits, etc. doesn't get lost or munged down like it does in Gerrit. Too bad because I'd like this and it's fairly transparent (not requiring Git voodoo to handle these things). It's not our workflow though.
== Duplication of tools, fragmentation of community == We don't want to fragment our communication EVEN MORE. GitHub wikis and bug management aren't such a big deal since we can probably disable those. But messaging and notification .... "oh, did you say that on GitHub? We didn't see that there." That's already a big enough headache, with Bugzilla and all the mailing lists and IRC channels and talk pages and and and. :-)
GitHubs only collabs are the repository itself, Gollum (their wiki), their Issue tracker, and the commit comments.
Assuming that Gollum and Issue tracker are turned off (pity, I don't care for Gollum, but their Issue tracker is nicely integrated), the commit comments and repository are no different than what currently is a feature in Gerrit. Dare I say it, but GitHub's commit comments are awesome. They leave Gerrit and every other review tool in the dust as far as I've seen.
I should mention that Gerrit's actual review of a change is nicer than even Phabricator's. You can step through and it will mark them as reviewed as you go. Obviously GitHub has no such thing since it has no concept of a pre-commit review or gated review.
== The Terms of Service == GitHub's ToS/Security/Privacy policies[2] pose a few problems for our needs.
One is that people under 13 can't sign up. I do not want to limit our community that way.
Makes sense. I didn't consider this.
Another is: "You may not duplicate, copy, or reuse any portion of the HTML/CSS, Javascript, or visual design elements or concepts without express written permission from GitHub." Do we really want to get into a possible situation where we have noticed a design concept or cool use of JS on GitHub but don't feel okay reusing it in our personal or professional projects?
I think this is boilerplate. In any case, that part should apply even if we were mirroring on GitHub. Besides, since we have no inclination of building a GitHub competitor, who cares.
And, considering our level of activity, check out this clause: "If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption." We simply cannot afford to have GitHub disable our access with no notice.
We still have the code and I'm sure there will be busier repositories out there. This is to prevent abuse on their side so they don't have to guarantee service.
== A couple open questions ==
- What's the FLOSS project on GitHub that's most like us, in terms of
size, number of unique repositories, privacy concerns, robustness needs, and so on? How are they dealing with these issues?
I don't know this. I believe when it migrates the jQuery plugin repository and jQuery itself will probably be the larger in terms of number of users and size. But they don't manage in a cascading web of trust.
- What does GitHub Enterprise buy us? Which of these issues would that fix?
It's a self-hosted GitHub. It would allow us to have private repositories (good for deploys, ops, etc.) and manage our own user database (we could integrate with our own auth system) and probably waives the 13 and under rule above.
The price is too steep since its a per-seat license. A nonstarter if the WMF is going to have to pay for every potential developer who wants to attach.
We do need a GitHub strategy -- to make our projects more discoverable, make use of more contributions, and participate in the GitHub reputational economy. So we must figure out the right ways to mirror and sync. But I doubt our own long-term needs would work well with using GitHub as our main platform.
I'm 1000% with you on this.
We should definitely at some point mirror our code in GitHub like the PHP project does http://www.php.net/git.php. Being able to publish and handle pull requests coming from GitHub would be a nice feature in Gerrit or any replacement. It'd be nice if others can have their own MW extensions or versions of extensions and core on GitHub and pull from us (and us from them) esp. for extensions that may need some love or have changes that don't satisfy the WMF code quality bar.
…
As for actually dealing with the pro/con of Gerrit vs. others-than-GitHub, I suppose I'll sit down and add matrix them on the wikipage if I have some time. I haven't yet thought things through enough to bore you with an even longer e-mail. :-)