Re: [Wiki Loves Monuments] Getting the pictures to the jury: The German approach

2 Oct 2011

      On 02/10/11 17:55, Kilian Kluge wrote:
...
Calling wget each time instead is a bit unefficient, I would
recommend using wget -i if you can.

Hmm, time isn't really an issue and I'm like 90% done right now ;-) How
much faster does it actually work? The limiting factor is my internet
connection anyway, isn't it? Even though it's really fast, I need
between 1 to 4 seconds per image, so it took roughly 12 hours to
download all the ones without ß (about 55GB).
The bandwidth is obviously the most limiting factor, and there's nothing 
to prevent it. Things like disk speed can be ignored in these cases, but 
there are a few latency added from opening new connections each time.
With separate wget invocations, each program needs to initialise, 
perform a dns query and connect to the server (SYN + SYN/ACK) before you 
can start asking for the file. With wget -i, it can reuse the existing 
connection to continue fetching files, so you avoid those initial steps 
that are not too slow (eg. half a second), but multiplied by all the 
files to fetch, are noticeable.
What you can do is to manually edit the files list to remove everything 
up to the current position, so only the missing files get downloaded by 
wget. As both lists are in the same order, it should work. You could 
also run wget -ci with the whole list, but that would require a head for 
each image already downloaded.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Wiki Loves Monuments] Getting the pictures to the jury: The German approach