Jeremy Baron, 23/09/2013 16:11:
On Sep 23, 2013 9:25 AM, "Mihai Chintoanu" <mihai.chintoanu@skobbler.com mailto:mihai.chintoanu@skobbler.com> wrote:
I have a list of about 1.8 million images which I have to download
from commons.wikimedia.org http://commons.wikimedia.org. Is there any simple way to do this which doesn't involve an individual HTTP hit for each image?
You mean full size originals, not thumbs scaled to a certain size, right?
You should rsync from a mirror[0] (rsync allows specifying a list of files to copy)
I agree that rsync is probably your best bet. Another mirror I'm building is on archive.org, organised by day of upload. You can also request an individual file directly from the zips but that's not super-efficient. https://archive.org/search.php?query=subject%3A%22Wikimedia+Commons%22
Nemo