Brion Vibber wrote:
Tim Starling wrote:
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
Note that rsync takes too long to build the file list for the full set of files; you have to break it up by smaller directories to get a reliable transfer.
I'm doing some testing now with rsync 3.0 (built from CVS) for updating our internal file upload backup. The new incremental recursion seems to handle the huge file set a lot better.
To give you an idea -- with rsync 2, I tried starting a sync job yesterday, and killed it this morning when I found it using 2.6 GIGABYTES OF MEMORY without having yet transferred ANY files!
3.0 starts transferring directories as they come along in smaller chunks instead of building a complete list of all (several million) files to transfer first.
Kinda nice. ;)
It might well be feasible to have a public rsync mirror if we can limit it to 3.0 clients.
-- brion vibber (brion @ wikimedia.org)