On 07/17/2012 08:41 PM, Rob Lanphier wrote:
It would appear from reading this page that the only alternative to Gerrit that has a serious following is GitHub. Is that the case?
We definitely need a GitHub *strategy*. GitHub draws together tons of open source contributors. So we ought to address:
* pull requests. People *will* clone our projects onto GitHub and end up submitting pull requests there; we have to find or make tools to sync those, or at least get notified about them and make it easy to pull them into whatever we use. [0] [1] * discoverability. Having a presence on GitHub gets us publicity to a lot of potential contributors. * reputation. People on GitHub want credit, in their system, for their commits. It'd help us to give them that somehow.
But I have a lot of reservations about using GitHub as our primary source control and code review platform. There's the free-as-in-freedom issue, of course, but I'm also concerned about flexibility, account management, fragmentation of community and duplication of tools, and their terms of service.
== Flexibility == I see GitHub as kind of like a Mac. It has a nice UI for the use case that its creators envision. It's fine for personal use. And if we try it, everything'll be great.... until we smack into an invisible brick wall. We'll want to work around one little thing, the way that we sneak around various issues in Gerrit, with hacks and searches and upgrades, if it's not in GitHub's web UI or API [3], we'll be stuck.
Right now we have our primary Git repo on our own machines, which is the ultimate backdoor. The way we have been modifying our tools, automating certain kinds of commits (like with l10n-bot), troubleshooting by looking at our logfiles, and generally customizing things to suit our weird needs -- GitHub is closed source and won't let us do that. We are not the typical use case for GitHub. Since we have hundreds of extensions, each with their own repository, we would have way more repositories and members than almost any other organization on there. So, one example: arbitrary sortability of lists of repositories. We could mod Gerrit to do it, but not GitHub. How would we centralize and list the repositories so they're easy to browse, search, edit, follow, and watch them together? It looks like GitHub's less suitable for that, but I'd welcome examples of orgs that create their own sub-GitHub hubs.
The WMF used to host the MediaWiki source code on SourceForge, about 8 years ago. We switched away for a number of reasons -- because SourceForge was not robust and reliable enough for our needs (extended downtime led to the actual switchover), because it didn't give us enough flexibility and customization, and because we couldn't get the data we wanted out of the host.
We could swap Greasemonkey scripts and the like to do a little personal UI customization on GitHub, but we could not make improvements or share them. With Gerrit, we've already begun forming some friendships with the development team and have contributed several small patches back upstream. Plus, Gerrit will provide a plugin/extension interface (starting with the next version, 2.5) which will allow us to further tweak it to our needs. But we would not be able to do that with GitHub. I can't see us actually hosting our deployment branches on GitHub; a scenario in which we do not control the access to that is *unacceptable*. And the more frequently we want to deploy, and the more entangled our source control gets into our deployment infrastructure, the more of a pain it'll be to have our source control someplace we can't tweak or totally trust.
== Accounts == By using GitHub, we would no longer be managing the user accounts. This would make single sign-on with other Wikimedia services (especially Labs) completely impossible.
I mentioned above that GitHub seems more meant for single FLOSS projects than for confederations of related repositories. GitHub does not have the concept of "groups," so granting access to collections of repos would be a time-consuming process. GitHub does not support branch-level permissions, either (it encourages "forking" and then merging back to master), and that does not seem as suitable for long-term collaborative branches.
Gerrit's Terms of Service (more on that below) requires people to use their "real" (wallet) names. Our community has many members who value their privacy, and we currently allow them to use their pseudonyms. (Since we control our registration process for Developer Access, we can ensure that users are who they claim to be, to our standards.)
== Duplication of tools, fragmentation of community == We don't want to fragment our communication EVEN MORE. GitHub wikis and bug management aren't such a big deal since we can probably disable those. But messaging and notification .... "oh, did you say that on GitHub? We didn't see that there." That's already a big enough headache, with Bugzilla and all the mailing lists and IRC channels and talk pages and and and. :-)
== The Terms of Service == GitHub's ToS/Security/Privacy policies[2] pose a few problems for our needs.
One is that people under 13 can't sign up. I do not want to limit our community that way.
Another is: "You may not duplicate, copy, or reuse any portion of the HTML/CSS, Javascript, or visual design elements or concepts without express written permission from GitHub." Do we really want to get into a possible situation where we have noticed a design concept or cool use of JS on GitHub but don't feel okay reusing it in our personal or professional projects?
And, considering our level of activity, check out this clause: "If your bandwidth usage significantly exceeds the average bandwidth usage (as determined solely by GitHub) of other GitHub customers, we reserve the right to immediately disable your account or throttle your file hosting until you can reduce your bandwidth consumption." We simply cannot afford to have GitHub disable our access with no notice.
== A couple open questions == * What's the FLOSS project on GitHub that's most like us, in terms of size, number of unique repositories, privacy concerns, robustness needs, and so on? How are they dealing with these issues? * What does GitHub Enterprise buy us? Which of these issues would that fix?
Basically, I'm thinking, let's not put so many of our eggs in the GitHub basket. GitHub is fine for FLOSS projects with fewer than a hundred repositories, ones that don't already have several communications channels, ones where privacy is less of a concern, or ones that don't run the sixth biggest website in the world practically right off trunk. But we have and will have so many strange, unforeseen needs that we should keep certain key operations on servers that we run and can hack at will.
We do need a GitHub strategy -- to make our projects more discoverable, make use of more contributions, and participate in the GitHub reputational economy. So we must figure out the right ways to mirror and sync. But I doubt our own long-term needs would work well with using GitHub as our main platform.
[0] https://bugzilla.wikimedia.org/show_bug.cgi?id=38196 [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=35497 [2] https://github.com/site/terms [3] http://developer.github.com/
(Thanks to Chad and RobLa for talking through much of this with me.)