Since the question was just raised on the IRC channel, I thought I'd share our procedure with everybody.
First, some background information: We have nine jurors in Germany, eight of them being Wikipedians from various fields and backgrounds and one expert. The jury will meet at the end of October for two days and will compile a top 100.
To prepare this weekend (they have to reach a decision then no matter what ;) ) it's obviously necessary to previously review all the pictures, there shouldn't be more than say 600-900 pictures to judge from. So we decided to do the following: Each of the eight Wikipedians will get his or her share of the almost 30000 uploads and will have to select at most 100 to bring to the final round.
Each juror receives his or her 3750 pictures on a usb flash drive. This allows me to "randomize" what each juror gets and is also very convenient for the jurors since they don't have to download anything. On the flash drive, each juror can handle the pictures in whatever way they like. They can delete some, create folders, use the image viewer of their choice, tag the pictures etc. For the final meeting, they can just bring the flash drive with them and the jury can quickly and easily create a shortlist.
The files still have their original name, so after the jury made their decision, they can go to commons and check whether each selected picture really fulfills all requirements.
So how does it work exactly?
I got a chronological list from the commons API with all files uploaded for WLM in Germany. I asked german Wikipedian DerHexer for the list, he claims it's quite easy to do, we already asked him to provide examples or - even better - a small form.
The list comes as one long line separated by spaces (which in my opinion is weird ;) ) and I wrote a short python script to handle it:
print "importing modules" import os print "reading monument list" file = open("komplett.txt","r") inhalt = file.read() liste = inhalt.split() print "starting download" juror = 1 counter = 0 for monument in liste: print counter # going to destination os.chdir(str(juror)) # preparing wget call = "wget " + monument # call wget os.system(call) # mess with jurors juror = juror + 1 if juror > 8: juror = 1 counter = counter + 1 os.chdir("..") print "Done. " + str(counter) + " files processed."
It's a little messed up in terms of language, but I guess you can see what it does. komplett.txt is the file with all the urls in it, i split it up and process each url. As I already mentioned, we have eight jurors, so I created eight folders named "1", "2", ..., "8" and the script walks through these folders. It works quite well, currently I'm at number 1600 ;)
So if there are any questions, let me know, I'll try to answer them.
Best regards,
Kilian
...and here comes the documentation for the file list script, my gratitude goes to DerHexer:
0) replace one line in the python script:
old: liste = inhalt.split() new: liste = inhalt.split(",")
1) add this line to your personal vector.js on Commons:
importScript('Commons:Wiki Loves Monuments 2011/filelistpercountry.js');
2) Trigger with this URL: http://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Monuments_...
3) Input the country (spelling like in the WLM category, e.g. "the Netherlands") and click "OK"
3a) Have a coffee or two, depending on the number of images ...
4) Get your list :)
Regards,
elya
Am 01.10.11 21:41, schrieb Kilian Kluge:
Since the question was just raised on the IRC channel, I thought I'd share our procedure with everybody.
First, some background information: We have nine jurors in Germany, eight of them being Wikipedians from various fields and backgrounds and one expert. The jury will meet at the end of October for two days and will compile a top 100.
To prepare this weekend (they have to reach a decision then no matter what ;) ) it's obviously necessary to previously review all the pictures, there shouldn't be more than say 600-900 pictures to judge from. So we decided to do the following: Each of the eight Wikipedians will get his or her share of the almost 30000 uploads and will have to select at most 100 to bring to the final round.
Each juror receives his or her 3750 pictures on a usb flash drive. This allows me to "randomize" what each juror gets and is also very convenient for the jurors since they don't have to download anything. On the flash drive, each juror can handle the pictures in whatever way they like. They can delete some, create folders, use the image viewer of their choice, tag the pictures etc. For the final meeting, they can just bring the flash drive with them and the jury can quickly and easily create a shortlist.
The files still have their original name, so after the jury made their decision, they can go to commons and check whether each selected picture really fulfills all requirements.
So how does it work exactly?
I got a chronological list from the commons API with all files uploaded for WLM in Germany. I asked german Wikipedian DerHexer for the list, he claims it's quite easy to do, we already asked him to provide examples or - even better - a small form.
The list comes as one long line separated by spaces (which in my opinion is weird ;) ) and I wrote a short python script to handle it:
print "importing modules" import os print "reading monument list" file = open("komplett.txt","r") inhalt = file.read() liste = inhalt.split() print "starting download" juror = 1 counter = 0 for monument in liste: print counter # going to destination os.chdir(str(juror)) # preparing wget call = "wget " + monument # call wget os.system(call) # mess with jurors juror = juror + 1 if juror > 8: juror = 1 counter = counter + 1 os.chdir("..") print "Done. " + str(counter) + " files processed."
It's a little messed up in terms of language, but I guess you can see what it does. komplett.txt is the file with all the urls in it, i split it up and process each url. As I already mentioned, we have eight jurors, so I created eight folders named "1", "2", ..., "8" and the script walks through these folders. It works quite well, currently I'm at number 1600 ;)
So if there are any questions, let me know, I'll try to answer them.
Best regards,
Kilian
There is a problem with the german umlaut ß (all the others, ä, ö and ü, work), the pictures aren't downloaded. I don't have the time to look into that now, so I keep downloading all the others and will try to find a solution tomorrow (and then download the ß-ones).
So if your language has some special characters that are accepted on commons and widely used (unfortunately, the german word for street is Straße ;) ) you should test that first. Once I have a workaround, I'll let you know.
On Sat, Oct 1, 2011 at 10:37 PM, elya ew_wp@web.de wrote:
...and here comes the documentation for the file list script, my gratitude goes to DerHexer:
- replace one line in the python script:
old: liste = inhalt.split() new: liste = inhalt.split(",")
- add this line to your personal vector.js on Commons:
importScript('Commons:Wiki Loves Monuments 2011/filelistpercountry.js');
- Trigger with this URL:
http://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Monuments_...
- Input the country (spelling like in the WLM category, e.g. "the
Netherlands") and click "OK"
3a) Have a coffee or two, depending on the number of images ...
- Get your list :)
Regards,
elya
Am 01.10.11 21:41, schrieb Kilian Kluge:
Since the question was just raised on the IRC channel, I thought I'd
share
our procedure with everybody.
First, some background information: We have nine jurors in Germany, eight
of
them being Wikipedians from various fields and backgrounds and one
expert.
The jury will meet at the end of October for two days and will compile a
top
To prepare this weekend (they have to reach a decision then no matter
what
;) ) it's obviously necessary to previously review all the pictures,
there
shouldn't be more than say 600-900 pictures to judge from. So we decided
to
do the following: Each of the eight Wikipedians will get his or her share
of
the almost 30000 uploads and will have to select at most 100 to bring to
the
final round.
Each juror receives his or her 3750 pictures on a usb flash drive. This allows me to "randomize" what each juror gets and is also very convenient for the jurors since they don't have to download anything. On the flash drive, each juror can handle the pictures in whatever way they like. They can delete some, create folders, use the image viewer of their choice,
tag
the pictures etc. For the final meeting, they can just bring the flash
drive
with them and the jury can quickly and easily create a shortlist.
The files still have their original name, so after the jury made their decision, they can go to commons and check whether each selected picture really fulfills all requirements.
So how does it work exactly?
I got a chronological list from the commons API with all files uploaded
for
WLM in Germany. I asked german Wikipedian DerHexer for the list, he
claims
it's quite easy to do, we already asked him to provide examples or - even better - a small form.
The list comes as one long line separated by spaces (which in my opinion
is
weird ;) ) and I wrote a short python script to handle it:
print "importing modules" import os print "reading monument list" file = open("komplett.txt","r") inhalt = file.read() liste = inhalt.split() print "starting download" juror = 1 counter = 0 for monument in liste: print counter # going to destination os.chdir(str(juror)) # preparing wget call = "wget " + monument # call wget os.system(call) # mess with jurors juror = juror + 1 if juror > 8: juror = 1 counter = counter + 1 os.chdir("..") print "Done. " + str(counter) + " files processed."
It's a little messed up in terms of language, but I guess you can see
what
it does. komplett.txt is the file with all the urls in it, i split it up
and
process each url. As I already mentioned, we have eight jurors, so I
created
eight folders named "1", "2", ..., "8" and the script walks through these folders. It works quite well, currently I'm at number 1600 ;)
So if there are any questions, let me know, I'll try to answer them.
Best regards,
Kilian
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Kilian Kluge wrote:
There is a problem with the german umlaut ß (all the others, ä, ö and ü, work), the pictures aren't downloaded. I don't have the time to look into that now, so I keep downloading all the others and will try to find a solution tomorrow (and then download the ß-ones).
So if your language has some special characters that are accepted on commons and widely used (unfortunately, the german word for street is Straße ;) ) you should test that first. Once I have a workaround, I'll let you know.
I don't even know how your script can work. You get a list of images (eg. "Foo.jpg Bar.jpg") and then you call wget with that. wget expects urls, not filenames. I suspect your ß problems are related to encodings. You are calling wget without even quoting it (you would also need to escape the quote characters, but that's an improvement). I think you will have problems with &, ' and ". Also, you are executing that in the shell, seems you have a command injection vulnerability. I hope nobody called his file Monument`rm -rf /`.JPG :) Calling wget each time instead is a bit unefficient, I would recommend using wget -i if you can.
Now, here is a little recipe in shell script, assumes a UNIX shell, such as bash (provided with GNU/Linux distributions, cygwin...).
Let's suppose you have a list of urls separated by spaces in komplett.txt. We want it as one url per line, so we do:
sed 's/ /\n/g' komplett.txt > in-lines.txt
We now split the list with one eigth per jury (replace 8 with the number of your juries)
for i in $(seq 1 8); do sed -n "$i~8p" > jury$i.txt; done
That gives us files jury1.txt, jury2.txt... jury8.txt
Move each of them to its own folder:
for i in $(seq 1 8); do mkdir "Files-jury-$i" ; mv jury$i.txt "Files-jury-$i"; done
And for each of them, run
wget -i jury1.txt
(change the number accordingly to the folder)
Move each folder to a different usb drive, you're done.
Hi Platonides,
On Sun, Oct 2, 2011 at 5:28 PM, Platonides platonides@gmail.com wrote:
Kilian Kluge wrote:
There is a problem with the german umlaut ß (all the others, ä, ö and ü, work), the pictures aren't downloaded. I don't have the time to look into that now, so I keep downloading all the others and will try to find a solution tomorrow (and then download the ß-ones).
So if your language has some special characters that are accepted on commons and widely used (unfortunately, the german word for street is Straße ;) ) you should test that first. Once I have a workaround, I'll let you know.
I don't even know how your script can work. You get a list of images (eg. "Foo.jpg Bar.jpg") and then you call wget with that. wget expects urls, not filenames.
The list actually has the urls in it, it's not a list of plain filenames. I should have mentioned that.
I suspect your ß problems are related to encodings. You are calling wget without even quoting it (you would also need to escape the quote characters, but that's an improvement). I think you will have problems with &, ' and ". Also, you are executing that in the shell, seems you have a command injection vulnerability. I hope nobody called his file Monument`rm -rf /`.JPG :)
The ß problem kind of solved itself. I downloaded a new list with the tool Martin aka DerHexer provided (see elya's mail) and now everything's fine. It took about 12 hours to download all files without ß, now I'm running a slightly altered script that downloads the missing ones.
Calling wget each time instead is a bit unefficient, I would recommend using wget -i if you can.
Hmm, time isn't really an issue and I'm like 90% done right now ;-) How much faster does it actually work? The limiting factor is my internet connection anyway, isn't it? Even though it's really fast, I need between 1 to 4 seconds per image, so it took roughly 12 hours to download all the ones without ß (about 55GB).
Thanks for your suggestions!
Kilian
On 02/10/11 17:55, Kilian Kluge wrote:
Calling wget each time instead is a bit unefficient, I would recommend using wget -i if you can.
Hmm, time isn't really an issue and I'm like 90% done right now ;-) How much faster does it actually work? The limiting factor is my internet connection anyway, isn't it? Even though it's really fast, I need between 1 to 4 seconds per image, so it took roughly 12 hours to download all the ones without ß (about 55GB).
The bandwidth is obviously the most limiting factor, and there's nothing to prevent it. Things like disk speed can be ignored in these cases, but there are a few latency added from opening new connections each time. With separate wget invocations, each program needs to initialise, perform a dns query and connect to the server (SYN + SYN/ACK) before you can start asking for the file. With wget -i, it can reuse the existing connection to continue fetching files, so you avoid those initial steps that are not too slow (eg. half a second), but multiplied by all the files to fetch, are noticeable.
What you can do is to manually edit the files list to remove everything up to the current position, so only the missing files get downloaded by wget. As both lists are in the same order, it should work. You could also run wget -ci with the whole list, but that would require a head for each image already downloaded.
Ausnahmsweise antworte ich auf Deutsch:
Das ist eine Liste, in der mehrheitlich keine Programmierer agieren. Als ich noch selbstständig war und auch nur ein Programmierer sich in meiner Firma erlaubte, sich mit Code-Scheiss wichtig zu machen, dann wurde er zu Recht zusammengeschissen. Und zwar gnadenlos.
Könnt ihr euch nicht vorstellen, dass es hier Leute gibt, welche durch sowas abgeschreckt werden oder - wie ich - wütend, über soviel Inkompetenz? Inkompetenz in sozialer Hinsicht? Will hier jemand zeigen wie gut er ist und vergisst im Grunde das Wesentliche?
den einzige Link, den ich gesehen habe, der führt auf eine Seite, die es zu erstellen gibt. Das sind keine Lösungen, das ist Wichtigtuerei!
http://commons.wikimedia.org/w/index.php?title=Commons:Wiki_Loves_Monuments_...
Das ist genau das, was ich auch für den Contest kritisierte und was sich auch im Ergebnis abzeichnete. Es bleibt ein Inside-Job, neue Leute wurden abgeschreckt und sind nur in geringer Zahl durch WLM zu uns gestoßen.
Aber vielleicht wollen wir ja elitär bleiben, und wenn wir noch nicht elitär genug sind, dann können wir das durch insidersprache sicher noch erreichen und uns reduzieren.
hubertl.
Am 01.10.2011 21:41, schrieb Kilian Kluge:
Since the question was just raised on the IRC channel, I thought I'd share our procedure with everybody.
First, some background information: We have nine jurors in Germany, eight of them being Wikipedians from various fields and backgrounds and one expert. The jury will meet at the end of October for two days and will compile a top 100.
To prepare this weekend (they have to reach a decision then no matter what ;) ) it's obviously necessary to previously review all the pictures, there shouldn't be more than say 600-900 pictures to judge from. So we decided to do the following: Each of the eight Wikipedians will get his or her share of the almost 30000 uploads and will have to select at most 100 to bring to the final round.
Each juror receives his or her 3750 pictures on a usb flash drive. This allows me to "randomize" what each juror gets and is also very convenient for the jurors since they don't have to download anything. On the flash drive, each juror can handle the pictures in whatever way they like. They can delete some, create folders, use the image viewer of their choice, tag the pictures etc. For the final meeting, they can just bring the flash drive with them and the jury can quickly and easily create a shortlist.
The files still have their original name, so after the jury made their decision, they can go to commons and check whether each selected picture really fulfills all requirements.
So how does it work exactly?
I got a chronological list from the commons API with all files uploaded for WLM in Germany. I asked german Wikipedian DerHexer for the list, he claims it's quite easy to do, we already asked him to provide examples or - even better - a small form.
The list comes as one long line separated by spaces (which in my opinion is weird ;) ) and I wrote a short python script to handle it:
print "importing modules" import os print "reading monument list" file = open("komplett.txt","r") inhalt = file.read() liste = inhalt.split() print "starting download" juror = 1 counter = 0 for monument in liste: print counter # going to destination os.chdir(str(juror)) # preparing wget call = "wget " + monument # call wget os.system(call) # mess with jurors juror = juror + 1 if juror > 8: juror = 1 counter = counter + 1 os.chdir("..") print "Done. " + str(counter) + " files processed."
It's a little messed up in terms of language, but I guess you can see what it does. komplett.txt is the file with all the urls in it, i split it up and process each url. As I already mentioned, we have eight jurors, so I created eight folders named "1", "2", ..., "8" and the script walks through these folders. It works quite well, currently I'm at number 1600 ;)
So if there are any questions, let me know, I'll try to answer them.
Best regards,
Kilian
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Hallo Heinz,
On Sun, Oct 2, 2011 at 7:34 AM, Hubert hubert.laska@gmx.at wrote:
Das ist eine Liste, in der mehrheitlich keine Programmierer agieren. Als ich noch selbstständig war und auch nur ein Programmierer sich in meiner Firma erlaubte, sich mit Code-Scheiss wichtig zu machen, dann wurde er zu Recht zusammengeschissen. Und zwar gnadenlos.
Ich hatte nicht vor, mich wichtig zu machen, sondern wollte in erster Linie Tomasz den Code zukommen lassen, damit er ihn selbst verwenden kann. Und ich gehe mal nicht davon aus, dass ich mich durch dieses unsaubere und stümperhafte Stück profiliere ;-)
Könnt ihr euch nicht vorstellen, dass es hier Leute gibt, welche durch sowas abgeschreckt werden oder - wie ich - wütend, über soviel Inkompetenz? Inkompetenz in sozialer Hinsicht? Will hier jemand zeigen wie gut er ist und vergisst im Grunde das Wesentliche?
Das verstehe ich nicht. Ich stelle eine Methode vor, die Bilder an die Jury zu bekommen. Weiter nichts. Genauso wie hier immer wieder Lösungen mit Templates, Upload Campaigns und den Statistiktools diskutiert wurden. Ich bin selbst kein guter Programmierer und habe da oft nur Bahnhof verstanden, aber wo sonst sollte es geklärt werden? Wenn mich am Ende nur das Ergebnis interessierte, habe ich die Diskussion schlichtweg nicht verfolgt sondern gewartet, bis die Tools verfügbar waren. Und war denen, die sie geschrieben haben, sehr dankbar dafür. Genauso war zumindest Tomasz froh, eine weitestgehend fertige Lösung zu haben.
den einzige Link, den ich gesehen habe, der führt auf eine Seite, die es zu erstellen gibt. Das sind keine Lösungen, das ist Wichtigtuerei!
Wenn wir für jedes Teilnehmerland eine API-Abfrage vorbereiten und ausführen, die dann in den meisten Fällen nicht genutzt wird, ist das völliger Unsinn. Und ja: Das herunterladen der Bilder und verteilen auf verschiedene Ordner erfordert ein Stück technisches Wissen, das nicht jeder hat. Aber sollte es nur deswegen irgendwo im Hinterzimmer diskutiert werden? Ich wäre froh gewesen, wenn ich nicht alles hätte selbst überlegen und testen müssen.
Das ist genau das, was ich auch für den Contest kritisierte und was sich auch im Ergebnis abzeichnete. Es bleibt ein Inside-Job, neue Leute wurden abgeschreckt und sind nur in geringer Zahl durch WLM zu uns gestoßen.
Das halte ich für eine krasse Fehleinschätzung. Wir haben zumeist über 70% neue Teilnehmer (http://dft.ba/-ZTs) , die Upload-Campaigns haben sehr gut funktioniert, die meisten Teilnehmer sind ohne Probleme mit den Formularen zurecht gekommen. Auch wenn wir die auf den letzten Drücker mit heißer Nadel gestrickt haben und immer wieder nachbessern mussten, ist es doch reibungsloser gelaufen als ich zunächst angenommen hatte.
Dass es im Hintergrund technisch und schwierig ist: Klar. Man kann sich aber auch ohne große Kenntnisse z.B. in die Toolserver-Datenbank reinarbeiten, aber das ist ein gutes Stück Arbeit. Muss man aber ja auch gar nicht. Es gibt auf der "Kundenseite" genug wichtige Aufgaben (ca. 75% aller WLM-Aufgaben haben nichts mit Technik zu tun!), wo sich jedermann einbringen kann.
Aber vielleicht wollen wir ja elitär bleiben, und wenn wir noch nicht elitär genug sind, dann können wir das durch insidersprache sicher noch erreichen und uns reduzieren.
Etwas ratlos.
Kilian
wikilovesmonuments@lists.wikimedia.org