Brion Vibber wrote:
Tim Starling wrote:
It's probably better to use rsync, that's
what we usually use for image
backups. It's quite robust, and efficient if used properly. The only
trouble with it is the server load -- it would quickly overload the
backend if we made it available for public use. Maybe we could set up an
rsyncd instance limited by client IP.
Note that rsync takes too long to build the file list for the full set
of files; you have to break it up by smaller directories to get a
reliable transfer.
I'm doing some testing now with rsync 3.0 (built from CVS) for updating
our internal file upload backup. The new incremental recursion seems to
handle the huge file set a lot better.
To give you an idea -- with rsync 2, I tried starting a sync job
yesterday, and killed it this morning when I found it using 2.6
GIGABYTES OF MEMORY without having yet transferred ANY files!
3.0 starts transferring directories as they come along in smaller chunks
instead of building a complete list of all (several million) files to
transfer first.
Kinda nice. ;)
It might well be feasible to have a public rsync mirror if we can limit
it to 3.0 clients.
-- brion vibber (brion @
wikimedia.org)