On Sat, 2007-08-11 at 08:37 -0400, Anthony wrote:
Then running daily a simple recursive download which checks timestamps to avoid downloading the same file over will be nearly as bandwidth efficient as rsync - and probably much more CPU efficient, as turning on indexing isn't going to "quickly overload the backend".
I don't think that's the case. rsync doesn't have to hit every single file on both ends when it compares the MD4 checksums of the remote end.
If you rely on various thousands of http clients to implement the header checking correctly, they WILL hit every single local and remote file at least once, twice if they need to fetch it (HEAD then GET).
You can limit how much cpu/bandwidth/etc. rsync/rsyncd takes up on the server side, and throttle connections back if you're worried about overloading it. You can also prohibit using the -z option on the server side, so clients don't abuse the CPU to compress data which is already compressed.
But now we're back to this point again... why not just use zsync and get the benefits of both worlds?
Unless your suggestion was to open up indexing for a -known- set of IPs, and not to the world-at-large..
wikitech-l@lists.wikimedia.org