Re: [Wikitech-l] Commons tech wishlist - Wikitech-l

11 Aug 2007

On Sat, 2007-08-11 at 08:37 -0400, Anthony wrote:
...
  Then running daily a simple recursive download which
checks timestamps
 to avoid downloading the same file over will be nearly as bandwidth
 efficient as rsync - and probably much more CPU efficient, as turning
 on indexing isn't going to "quickly overload the backend".  
I don't think that's the case. rsync doesn't have to hit every single
file on both ends when it compares the MD4 checksums of the remote end.

If you rely on various thousands of http clients to implement the header
checking correctly, they WILL hit every single local and remote file at
least once, twice if they need to fetch it (HEAD then GET).

You can limit how much cpu/bandwidth/etc. rsync/rsyncd takes up on the
server side, and throttle connections back if you're worried about
overloading it. You can also prohibit using the -z option on the server
side, so clients don't abuse the CPU to compress data which is already
compressed. 

But now we're back to this point again... why not just use zsync and get
the benefits of both worlds?

Unless your suggestion was to open up indexing for a -known- set of IPs,
and not to the world-at-large.. 

-- 
David A. Desrosiers
desrod(a)gnu-designs.com
setuid(a)gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820