On Mon, Jun 9, 2014 at 11:55 AM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
- A bigger repo is obviously slower to download in full; is it also
slower to search or otherwise work with? How much slower? Most of our developers are probably not on SSDs yet.
git is largely insensitive to repo size. It doesn't traverse the entire history unless it needs to. So for most developer tasks, the only constraint is disk size. (There might be a mild size dependency in 'git fetch' -- but I think that's more related to the number of branches in your tree and the tree structure, not simply to number of commits in the history.)
However, there are several steps in our current development *pipeline* that do things like naive 'git clone', whose speed depends linearly on the size of the repo. ("Naive" as it's a "simple matter of software" to use a cached local repo or pack to speed things up.) I believe currently both "git review" (submitting patches to gerrit) and the time it takes for jenkins to run tests have steps of this sort; "git review" on mediawiki/core is noticibly much slower than on parsoid (for example).
Traded off again this -- if history is truncated, it will be much slower/more complicated for developers to do meaningful history searches, as has been mentioned. I'd expect that most hard-core developers would end up having to download the complete history anyway, as Bartosz suggests.
In summary, it seems to me that the reasonable forward path at the moment is some combination of (a) better documenting the use of shallow clones for newbie/infrequent contributors to reduce the initial developer roadblock (including verifying that this works with gerrit, etc), and (b) spending more effort optimizing the 'git clone' step in jenkins/gerrit (we already do some of this), and (c) paying attention to how phabricator uses git, to ensure that the repo size does not become an issue in the future. --scott