On Sat, Aug 29, 2009 at 8:37 AM, Brion Vibberbrion@wikimedia.org wrote:
One of my main concerns with a git transition though is figuring out how to divide up the repository into manageable pieces; our SVN repo includes many different projects including MediaWiki core, lots of extensions, dump processing tools, our custom Ubuntu packages for Wikimedia server deployment, our load balancing tools, etc.
I'd say put extensions and core in one repo, and make different repos for everything else that's logically separate and unlikely to move from repo to repo. Or we could have core in one repo, and use submodules for extensions.
On Sat, Aug 29, 2009 at 9:07 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:
Some local coder told me that GIT is slower and consumes much more RAM on some operations than SVN. I can't confirm that, though, because I never used GIT and still rarely use SVN. But, be warned.
I hope we can do better than vague second-hand rumors when considering easily quantifiable things like performance. In my experience, git is much faster than SVN on many operations simply because it doesn't need to go to the server. blame is often fast enough that you can do them without feeling much urge to switch to another window while you wait. log, diff (even for old revisions), etc. are nearly instantaneous.
Just out of interest, I once timed a fresh SVN checkout of trunk/ from svn.wikimedia.org to the toolserver (so they were even in the same datacenter). Then I tested a git clone of a git-svn checkout of the *entire* mediawiki/ repository, including trunk, branches, tags, and all history, from a server I have (in Denver) to the toolserver (in Amsterdam). The git clone was *significantly* faster. (To be fair, it *was* http:// vs. git://, but still.)
Another interesting data point: an SVN checkout of trunk/, with working copy, is 696M. A git svn checkout of the entire repository with branches, tags, and all history, plus working copy, is 661M after git gc. (But admittedly, before git gc it was 1005M. Not too bad anyway: 44% more for branches+tags+history.)
So I'm going to say that this claim completely contradicts my experience using both git and SVN (and I've used both fairly extensively). Of course, I'm sure there are "some operations" where git is slower than SVN, but that's not a very useful claim, given that the converse is certainly true.
The only time I've found where git performance was totally unacceptable when I tried to use it for compression/versioning of >1G database backups -- git add used up multiple gigabytes of RAM and OOMed when I tried to add the first file. The lesson there, obviously, is that git is meant to version source code, not files as large as database backups. git is known not to do so well on extremely large repos. It also has some trouble if you have lots of large binary files that change a lot -- those can't be compressed easily, especially if they're already compressed somehow.
On Sat, Aug 29, 2009 at 5:38 PM, Daniel Friesenlists@nadir-seen-fire.com wrote:
As a thought we could probably structure it something like this:
- We have a phase3 repo for MediaWiki itself
One objection: could we please rename phase3 something sensible like "mediawiki" if we do this? :)
On Sat, Aug 29, 2009 at 6:48 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:
And so to the disk. If the disk or the controller sucks or is simply old (not everyone has shiny new hardware), you're also damn slow.
Um, and SVN won't be? Do you have actual benchmarks indicating that git is slower than svn on *any* typical distributed source-control workload?
(By "distributed" I mean "with the remote server over the Internet". I could well believe that if you're in an office where the remote server is on a LAN, and the remote server happens to have 8 15k RPM SCSIs and 16G RAM while your workstation is a three-year-old consumer desktop, SVN would be a lot faster. SVN might be better for other things as well, like anything involving very large files or very large repos. But none of these are applicable to us.)
What should also not be underestimated is the diskspace demand of a GIT repo
Have you actually tested the disk space usage side-by-side? Because my figures (from above) indicate that git doesn't use much more disk space than SVN at all. SVN doesn't compress its .svn directories at all AFAICT -- git uses extremely heavy-duty compression.