On Sat, 2007-08-11 at 07:44 +0100, Tim Starling wrote:
It's probably better to use rsync, that's what we usually use for image backups. It's quite robust, and efficient if used properly. The only trouble with it is the server load -- it would quickly overload the backend if we made it available for public use. Maybe we could set up an rsyncd instance limited by client IP.
robchurch/brion/avar/JeLuF/etc. and I talked about this on irc a few of years ago in the context of pulling/sharing the database dumps, and the determination at that time was made that it was more efficient to use lighttpd on the front end and use http than to set up rsyncd and use that.
We even discussed how to add zsync support (rsync over http), to try to bridge those two pieces, and I don't think it was ever persued beyond that. gzip even has a flag for --rsyncable, so you can optimize the compressed dumps for this.
I'd love to see an rsyncd set up for mirroring the images on Commons and the other projects. Not only would that allow the mirrors to stay in lockstep with the current public wiki (as images are changed, renamed, removed.. the mirrors would retain those changes vs. having a wget or torrent-o-images), but it would only refetch that which has changed.
I'm a big fan of rsync, and use it here for mirroring Gutenberg, CPAN, etc. so adding Wikimedia images (or even db dumps again?) would be great.
Perhaps designating a few "1st Tier" mirrors, through which the torrents, http mirrors, etc. can be designated would be a good start. You have rsyncd on the w.m.o. servers locked to a specific, known set of IPs.
Those IPs fetch the data across rsync, and then set up their own public http/rsync/zsync/etc. mirrors of their own, and you point "2nd through Nth Tier" mirrors at those first-level mirrors, instead of your own public servers.
Definitely worth exploring rsync though, in this case.