On Sat, 2007-08-11 at 07:44 +0100, Tim Starling wrote:
It's probably better to use rsync, that's what
we usually use for
image backups. It's quite robust, and efficient if used properly. The
only trouble with it is the server load -- it would quickly overload
the backend if we made it available for public use. Maybe we could set
up an rsyncd instance limited by client IP.
robchurch/brion/avar/JeLuF/etc. and I talked about this on irc a few of
years ago in the context of pulling/sharing the database dumps, and the
determination at that time was made that it was more efficient to use
lighttpd on the front end and use http than to set up rsyncd and use
that.
We even discussed how to add zsync support (rsync over http), to try to
bridge those two pieces, and I don't think it was ever persued beyond
that. gzip even has a flag for --rsyncable, so you can optimize the
compressed dumps for this.
I'd love to see an rsyncd set up for mirroring the images on Commons and
the other projects. Not only would that allow the mirrors to stay in
lockstep with the current public wiki (as images are changed, renamed,
removed.. the mirrors would retain those changes vs. having a wget or
torrent-o-images), but it would only refetch that which has changed.
I'm a big fan of rsync, and use it here for mirroring Gutenberg, CPAN,
etc. so adding Wikimedia images (or even db dumps again?) would be
great.
Perhaps designating a few "1st Tier" mirrors, through which the
torrents, http mirrors, etc. can be designated would be a good start.
You have rsyncd on the w.m.o. servers locked to a specific, known set of
IPs.
Those IPs fetch the data across rsync, and then set up their own public
http/rsync/zsync/etc. mirrors of their own, and you point "2nd through
Nth Tier" mirrors at those first-level mirrors, instead of your own
public servers.
Definitely worth exploring rsync though, in this case.
--
David A. Desrosiers
desrod(a)gnu-designs.com
setuid(a)gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820