On Fri, Jan 8, 2010 at 6:06 PM, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
<snip>
No one wants the monolithic tarball. The way I got
updates previously
was via a rsync push.
No one sane would suggest a monolithic tarball: it's too much of a
pain to produce!
I know that You didn't want or use a tarball, but requests for an
"image dump" are not that uncommon and often the requester is
envisioning something like a tarball. Arguably that is what the
originator of this thread seems to have been asking for. I think you
and I are probably mostly on the same page about the virtue of
ensuring that images can be distributed and that monolithic approaches
are bad.
<snip>
But I think producing subsets is pretty much
worthless. I can't think
of a valid use for any reasonably sized subset. ("All media used on
big wiki X" is a useful subset I've produced for people before, but
it's not small enough to be a big win vs a full copy)
Wikipedia itself has gotten so large that increasingly people are
mirroring subsets rather than allocate the space for a full mirror
(e.g. 10000 pages on cooking, or medicine, or whatever). Grabbing
images needed for such an application would be useful. I can also see
virtues in having a way grab all images in a category (or set of
categories). For example, grab all images of dogs, or all images of
Barack Obama. In case you think this is all hypothetical, I've
actually downloaded tens of thousands of images on more than one
occasion to support topical projects.
<snip>
If all is made available then everyone's wants can
be satisfied. No
subset is going to get us there. Of course, there are a lot of
possibilities for the means of transmission, but I think it would be
most useful to assume that at least a few people are going to want to
grab everything.
Of course, strictly speaking we already provide HTTP access to
everything. So the real question is how can we make access easier,
more reliable, and less burdensome. You or someone else suggested an
API for grabbing files and that seems like a good idea. Ultimately
the best answer may well be to take multiple approaches to accommodate
both people like you who want everything as well as people that want
only more modest collections.
-Robert Rohde