On Mon, Aug 15, 2011 at 18:40, Russell N. Nelson - rnnelson
<rnnelson(a)clarkson.edu> wrote:
The problem is that 1) the files are bulky,
That's expected. :-)
2) there are many of them, 3) they are in constant
flux,
That is not really a problem: since there are many of them
statistically they are not in flux.
and 4) it's likely that your connection would
close for whatever reason part-way through the download..
I seem not to forgot to mention zsync/rsync. ;-)
Even taking a snapshot of the filenames is dicey. By
the time you finish, it's likely that there will be new ones, and possible that some
will be deleted. Probably the best way to make this work is to 1) make a snapshot of files
periodically,
Since I've been told they're backed up it naturally should exist.
2) create an API which returns a tarball using the
snapshot of files that also implements Range requests.
I would very much prefer ready-to-use format instead of a tarball, not
to mention it's pretty resource consuming to create a tarball just for
that.
Of course, this would result in a 12-terabyte file on
the recipient's host. That wouldn't work very well. I'm pretty sure that the
recipient would need an http client which would 1) keep track of the place in the
bytestream and 2) split out files and write them to disk as separate files. It's
possible that a program like getbot already implements this.
I'd make a snapshot without tar especially because partial transfers
aren't possible that way.
--
byte-byte,
grin