On 02/10/11 17:55, Kilian Kluge wrote:
Calling wget each time instead is a bit unefficient, I would recommend using wget -i if you can.
Hmm, time isn't really an issue and I'm like 90% done right now ;-) How much faster does it actually work? The limiting factor is my internet connection anyway, isn't it? Even though it's really fast, I need between 1 to 4 seconds per image, so it took roughly 12 hours to download all the ones without ß (about 55GB).
The bandwidth is obviously the most limiting factor, and there's nothing to prevent it. Things like disk speed can be ignored in these cases, but there are a few latency added from opening new connections each time. With separate wget invocations, each program needs to initialise, perform a dns query and connect to the server (SYN + SYN/ACK) before you can start asking for the file. With wget -i, it can reuse the existing connection to continue fetching files, so you avoid those initial steps that are not too slow (eg. half a second), but multiplied by all the files to fetch, are noticeable.
What you can do is to manually edit the files list to remove everything up to the current position, so only the missing files get downloaded by wget. As both lists are in the same order, it should work. You could also run wget -ci with the whole list, but that would require a head for each image already downloaded.