On Sat, Aug 29, 2009 at 8:37 AM, Brion Vibber<brion(a)wikimedia.org> wrote:
One of my main concerns with a git transition though
is figuring out how
to divide up the repository into manageable pieces; our SVN repo
includes many different projects including MediaWiki core, lots of
extensions, dump processing tools, our custom Ubuntu packages for
Wikimedia server deployment, our load balancing tools, etc.
I'd say put extensions and core in one repo, and make different repos
for everything else that's logically separate and unlikely to move
from repo to repo. Or we could have core in one repo, and use
submodules for extensions.
On Sat, Aug 29, 2009 at 9:07 AM, Dmitriy Sintsov<questpc(a)rambler.ru> wrote:
Some local coder told me that GIT is slower and
consumes much more RAM
on some operations than SVN.
I can't confirm that, though, because I never used GIT and still rarely
use SVN. But, be warned.
I hope we can do better than vague second-hand rumors when considering
easily quantifiable things like performance. In my experience, git is
much faster than SVN on many operations simply because it doesn't need
to go to the server. blame is often fast enough that you can do them
without feeling much urge to switch to another window while you wait.
log, diff (even for old revisions), etc. are nearly instantaneous.
Just out of interest, I once timed a fresh SVN checkout of trunk/ from
svn.wikimedia.org to the toolserver (so they were even in the same
datacenter). Then I tested a git clone of a git-svn checkout of the
*entire* mediawiki/ repository, including trunk, branches, tags, and
all history, from a server I have (in Denver) to the toolserver (in
Amsterdam). The git clone was *significantly* faster. (To be fair,
it *was* http:// vs. git://, but still.)
Another interesting data point: an SVN checkout of trunk/, with
working copy, is 696M. A git svn checkout of the entire repository
with branches, tags, and all history, plus working copy, is 661M after
git gc. (But admittedly, before git gc it was 1005M. Not too bad
anyway: 44% more for branches+tags+history.)
So I'm going to say that this claim completely contradicts my
experience using both git and SVN (and I've used both fairly
extensively). Of course, I'm sure there are "some operations" where
git is slower than SVN, but that's not a very useful claim, given that
the converse is certainly true.
The only time I've found where git performance was totally
unacceptable when I tried to use it for compression/versioning of >1G
database backups -- git add used up multiple gigabytes of RAM and
OOMed when I tried to add the first file. The lesson there,
obviously, is that git is meant to version source code, not files as
large as database backups. git is known not to do so well on
extremely large repos. It also has some trouble if you have lots of
large binary files that change a lot -- those can't be compressed
easily, especially if they're already compressed somehow.
On Sat, Aug 29, 2009 at 5:38 PM, Daniel
Friesen<lists(a)nadir-seen-fire.com> wrote:
As a thought we could probably structure it something
like this:
- We have a phase3 repo for MediaWiki itself
One objection: could we please rename phase3 something sensible like
"mediawiki" if we do this? :)
On Sat, Aug 29, 2009 at 6:48 PM, Marco
Schuster<marco(a)harddisk.is-a-geek.org> wrote:
And so to the disk. If the disk or the controller
sucks or is simply old
(not everyone has shiny new hardware), you're also damn slow.
Um, and SVN won't be? Do you have actual benchmarks indicating that
git is slower than svn on *any* typical distributed source-control
workload?
(By "distributed" I mean "with the remote server over the Internet".
I could well believe that if you're in an office where the remote
server is on a LAN, and the remote server happens to have 8 15k RPM
SCSIs and 16G RAM while your workstation is a three-year-old consumer
desktop, SVN would be a lot faster. SVN might be better for other
things as well, like anything involving very large files or very large
repos. But none of these are applicable to us.)
What should
also not be underestimated is the diskspace demand of a GIT repo
Have you actually tested the disk space usage side-by-side? Because
my figures (from above) indicate that git doesn't use much more disk
space than SVN at all. SVN doesn't compress its .svn directories at
all AFAICT -- git uses extremely heavy-duty compression.