Re: [Wiki Loves Monuments] Getting the pictures to the jury: The German approach

2 Oct 2011


      Hi Platonides,
On Sun, Oct 2, 2011 at 5:28 PM, Platonides platonides@gmail.com wrote:
...
Kilian Kluge wrote:
...
There is a problem with the german umlaut ß (all the others, ä, ö and ü,
work), the pictures aren't downloaded. I don't have the time to look
into that now, so I keep downloading all the others and will try to find
a solution tomorrow (and then download the ß-ones).
So if your language has some special characters that are accepted on
commons and widely used (unfortunately, the german word for street is
Straße ;) ) you should test that first. Once I have a workaround, I'll
let you know.
I don't even know how your script can work.
You get a list of images (eg. "Foo.jpg Bar.jpg") and then you call wget
with that. wget expects urls, not filenames.
The list actually has the urls in it, it's not a list of plain filenames. I
should have mentioned that.
...
I suspect your ß problems are related to encodings. You are calling wget
without even quoting it (you would also need to escape the quote characters,
but that's an improvement). I think you will have problems with &, ' and ".
Also, you are executing that in the shell, seems you have a command
injection vulnerability. I hope nobody called his file Monument`rm -rf
/`.JPG :)
The ß problem kind of solved itself. I downloaded a new list with the tool
Martin aka DerHexer provided (see elya's mail) and now everything's fine. It
took about 12 hours to download all files without ß, now I'm running a
slightly altered script that downloads the missing ones.
...
Calling wget each time instead is a bit unefficient, I would recommend
using wget -i if you can.
Hmm, time isn't really an issue and I'm like 90% done right now ;-) How much
faster does it actually work? The limiting factor is my internet connection
anyway, isn't it? Even though it's really fast, I need between 1 to 4
seconds per image, so it took roughly 12 hours to download all the ones
without ß (about 55GB).
Thanks for your suggestions!
Kilian

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [Wiki Loves Monuments] Getting the pictures to the jury: The German approach