Hi;

@Derrick: I don't trust Amazon. Really, I don't trust Wikimedia Foundation either. They can't and/or they don't want to provide image dumps (what is worst?). Community donates images to Commons, community donates money every year, and now community needs to develop a software to extract all the images and packed them, and of course, host them in a permanent way. Crazy, right?

@Milos: Instead of spliting image dump using the first letter of filenames, I thought about spliting using the upload date (YYYY-MM-DD). So, first chunks (2005-01-01) will be tiny, and recent ones of several GB (a single day).

Regards,
emijrp

2011/6/28 Derrick Coetzee <dcoetzee@eecs.berkeley.edu>
As a Commons admin I've thought a lot about the problem of
distributing Commons dumps. As for distribution, I believe BitTorrent
is absolutely the way to go, but the Torrent will require a small
network of dedicated permaseeds (servers that seed indefinitely).
These can easily be set up at low cost on Amazon EC2 "small" instances
- the disk storage for the archives is free, since small instances
include a  large (~120 GB) ephemeral storage volume at no additional
cost, and the cost of bandwidth can be controlled by configuring the
BitTorrent client with either a bandwidth throttle or a transfer cap
(or both). In fact, I think all Wikimedia dumps should be available
through such a distribution solution, just as all Linux installation
media are today.

Additionally, it will be necessary to construct (and maintain) useful
subsets of Commons media, such as "all media used on the English
Wikipedia", or "thumbnails of all images on Wikimedia Commons", of
particular interest to certain content reusers, since the full set is
far too large to be of interest to most reusers. It's on this latter
point that I want your feedback: what useful subsets of Wikimedia
Commons does the research community want? Thanks for your feedback.

--=20
Derrick Coetzee
User:Dcoetzee, English Wikipedia and Wikimedia Commons administrator
http://www.eecs.berkeley.edu/~dcoetzee/