On Fri, Jan 8, 2010 at 6:06 PM, Gregory Maxwell gmaxwell@gmail.com wrote: <snip>
No one wants the monolithic tarball. The way I got updates previously was via a rsync push.
No one sane would suggest a monolithic tarball: it's too much of a pain to produce!
I know that You didn't want or use a tarball, but requests for an "image dump" are not that uncommon and often the requester is envisioning something like a tarball. Arguably that is what the originator of this thread seems to have been asking for. I think you and I are probably mostly on the same page about the virtue of ensuring that images can be distributed and that monolithic approaches are bad.
<snip>
But I think producing subsets is pretty much worthless. I can't think of a valid use for any reasonably sized subset. ("All media used on big wiki X" is a useful subset I've produced for people before, but it's not small enough to be a big win vs a full copy)
Wikipedia itself has gotten so large that increasingly people are mirroring subsets rather than allocate the space for a full mirror (e.g. 10000 pages on cooking, or medicine, or whatever). Grabbing images needed for such an application would be useful. I can also see virtues in having a way grab all images in a category (or set of categories). For example, grab all images of dogs, or all images of Barack Obama. In case you think this is all hypothetical, I've actually downloaded tens of thousands of images on more than one occasion to support topical projects.
<snip>
If all is made available then everyone's wants can be satisfied. No subset is going to get us there. Of course, there are a lot of possibilities for the means of transmission, but I think it would be most useful to assume that at least a few people are going to want to grab everything.
Of course, strictly speaking we already provide HTTP access to everything. So the real question is how can we make access easier, more reliable, and less burdensome. You or someone else suggested an API for grabbing files and that seems like a good idea. Ultimately the best answer may well be to take multiple approaches to accommodate both people like you who want everything as well as people that want only more modest collections.
-Robert Rohde