Hello list,
I need some help with downloading all french pronunciation files (the ogg files) off the fr.wiktionary.org.
As a work around I have this:
grep -o audio=.*ogg frwiktionary-20081228-pages-articles.xml |sed 's/ audio=([aA-zZ].*)/curl\ -sO\ "\http://fr.wiktionary.org/wiki/ Fichier:\1"/g' > filenames
which extracts all file_names from `frwiktionary-20081228-pages- articles.xml` and then I added this:
http://fr.wiktionary.org/wiki/Fichier:file_names
as this as the base url. So a simple
curl -O "http://fr.wiktionary.org/wiki/Fichier:Fr-chaise.ogg"
should Dload the any file. It all looks fine and rosy up to here but when I open the file with QTPlayer it doesn't play and the info window gives me 0 bytes whereas:
ls Fichier:Fr-chaise.ogg -rw-r--r-- - 24K 11 Jan 09:53 Fichier:Fr-chaise.ogg
There is nothing wrong with my players; this shell script:
test -d /tmp/frp || /bin/mkdir -p /tmp/frp ; cd /tmp/frp ; for i in $ (curl -s http://fr.wiktionary.org/wiki/$%7B1%7D |grep --only-matching "http.*ogg"" |/usr/bin/sed 's/".*$//') ; do curl -sO $i; done && open -g -a itunes /tmp/frp/Fr-${1}*.ogg
works for most entries (try chaise). So I Dload the file Fr-chaise.ogg and play the files with no problem. Also:
ls /Volumes/neo/Users/pm/Music/iTunes/iTunes\ Music/Unknown\ Artist/ Unknown\ Album/une\ chaise.ogg -rw-r--r--@ - 15K 11 Jan 10:00 …/Music/iTunes/iTunes Music/Unknown Artist/Unknown Album/une chaise.ogg
Thanks,
Hi Chales,
when you download "http://fr.wiktionary.org/wiki/Fichier:Fr-chaise.ogg", I think what you obtain is not an ogg file, but an html file, i.e. the same html that you see if you click on the link. (you can try to edit the file you obtain with a text editor to check)
What you need to download is the file linked in this html, in this case, http://upload.wikimedia.org/wikipedia/commons/a/a3/Fr-chaise.ogg There is a method which can be used to automatically guess the /a/a3/ directories given the name of the file ( Fr-chaise.ogg ).
I am not sure how I did it (but I did it), I think it is the first character of the md5sum and then the first two characters.
Do not hesitate to contact me if you are interested in this md5 thingy, I know I have it somewhere.
Have fun !
should Dload the any file. It all looks fine and rosy up to here but when I open the file with QTPlayer it doesn't play and the info window gives me 0 bytes whereas:
ls Fichier:Fr-chaise.ogg -rw-r--r-- - 24K 11 Jan 09:53 Fichier:Fr-chaise.ogg
There is nothing wrong with my players; this shell script:
test -d /tmp/frp || /bin/mkdir -p /tmp/frp ; cd /tmp/frp ; for i in $ (curl -s http://fr.wiktionary.org/wiki/$%7B1%7Dhttp://fr.wiktionary.org/wiki/$%7B1%7D|grep --only-matching "http.*ogg"" |/usr/bin/sed 's/".*$//') ; do curl -sO $i; done && open -g -a itunes /tmp/frp/Fr-${1}*.ogg
works for most entries (try chaise). So I Dload the file Fr-chaise.ogg and play the files with no problem. Also:
ls /Volumes/neo/Users/pm/Music/iTunes/iTunes\ Music/Unknown\ Artist/ Unknown\ Album/une\ chaise.ogg -rw-r--r--@ - 15K 11 Jan 10:00 …/Music/iTunes/iTunes Music/Unknown Artist/Unknown Album/une chaise.ogg
Thanks, _______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Hoi, I am surprised that these files are in the French Wikitionary.. It would be helpful when these files were all moved to Commons so that the other Wiktionaries could benefit from them as well. Thanks, GerardM
2009/1/12 Christophe Millet kipmaster@gmail.com
Hi Chales,
when you download "http://fr.wiktionary.org/wiki/Fichier:Fr-chaise.ogg", I think what you obtain is not an ogg file, but an html file, i.e. the same html that you see if you click on the link. (you can try to edit the file you obtain with a text editor to check)
What you need to download is the file linked in this html, in this case, http://upload.wikimedia.org/wikipedia/commons/a/a3/Fr-chaise.ogg There is a method which can be used to automatically guess the /a/a3/ directories given the name of the file ( Fr-chaise.ogg ).
I am not sure how I did it (but I did it), I think it is the first character of the md5sum and then the first two characters.
Do not hesitate to contact me if you are interested in this md5 thingy, I know I have it somewhere.
Have fun !
should Dload the any file. It all looks fine and rosy up to here but when I open the file with QTPlayer it doesn't play and the info window gives me 0 bytes whereas:
ls Fichier:Fr-chaise.ogg -rw-r--r-- - 24K 11 Jan 09:53 Fichier:Fr-chaise.ogg
There is nothing wrong with my players; this shell script:
test -d /tmp/frp || /bin/mkdir -p /tmp/frp ; cd /tmp/frp ; for i in $ (curl -s http://fr.wiktionary.org/wiki/$%7B1%7Dhttp://fr.wiktionary.org/wiki/$%7B1%7D
http://fr.wiktionary.org/wiki/$%7B1%7D|grep --only-matching
"http.*ogg"" |/usr/bin/sed 's/".*$//') ; do curl -sO $i; done && open -g -a itunes /tmp/frp/Fr-${1}*.ogg
works for most entries (try chaise). So I Dload the file Fr-chaise.ogg and play the files with no problem. Also:
ls /Volumes/neo/Users/pm/Music/iTunes/iTunes\ Music/Unknown\ Artist/ Unknown\ Album/une\ chaise.ogg -rw-r--r--@ - 15K 11 Jan 10:00 …/Music/iTunes/iTunes Music/Unknown Artist/Unknown Album/une chaise.ogg
Thanks, _______________________________________________ Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
They are on Commons if I understand "Ce fichier et les informations de sa page de description sont présents sur Wikimedia Commons." correctly. ;)
Th.
2009/1/12, Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, I am surprised that these files are in the French Wikitionary.. It would be helpful when these files were all moved to Commons so that the other Wiktionaries could benefit from them as well. Thanks, GerardM
cirwin at #wiktionary showed me a better way of doing this using python. Here is how-to for the curious:
svn co https://mwclient.svn.sourceforge.net/svnroot/mwclient/trunk cd trunk phython #at least python 2.4 is needed (see the README.txt that comes with mwclient)
import mwclient commons = mwclient.Site('commons.wikimedia.org')
for img in commons.pages['Category:French pronunciation'].members(): if img.name.endswith('.ogg'): print img.name.encode('utf-8') saveas = open(u"/tmp/%s" % img.name[5:],'w') remote = img.download() saveas.write(remote.read()) saveas.close()
Then Ctrl-D to get out of python. This is going to populate your /tmp/ folder with all the ogg files (130M).
I forgot to ask cirwin about how to keep the local copy up-to-date so if anyone here is familiar with python or mwclien or knows Bryan <http://commons.wikimedia.org/wiki/User_talk:Bryan
and can get an answer from him (or cirwin) I would appreciate it
very much.
Thanks,
wiktionary-l@lists.wikimedia.org