Hello!
At the moment I and some other users of the dumps show every day/week/month at http://dumps.wikimedia.org/ for a new dump. I use the dump of 10 different languages for my projects (Wikipedia-World, Template-Tiger, Persondata), so it is really work.
I hope someone at this list can program a simple to use "Dump-Mail-Service". Maybe you have time, please do it! :-)
I want a mail
- for a list of language ( for example: de, en, fr and es) - if the new dump is complete (all files) - or if the file "pages-articles.xml.bz2" is complete
Thanks for your help!
Stefan Kühn http://de.wikipedia.org/wiki/User:Stefan_K%C3%BChn
Stefan Kühn wrote:
Hello!
At the moment I and some other users of the dumps show every day/week/month at http://dumps.wikimedia.org/ for a new dump. I use the dump of 10 different languages for my projects (Wikipedia-World, Template-Tiger, Persondata), so it is really work.
I hope someone at this list can program a simple to use "Dump-Mail-Service". Maybe you have time, please do it! :-)
Wouldn't be preferably to have a daemon on the toolserver downloading each new dump (or dumps marked as "someone is interested") ? It would then place them on a common folder for any toolserver user.
When a new dump is out, it could send email alerts (your proposal), start "cron" jobs... Apart of that, your tools could at any time check if there's a dump file newer than your last definition.
PS: You know you can subscribe to RSS feeds for the xml dumps, do you?
Lars Dɪᴇᴄᴋᴏᴡ 迪拉斯 wrote:
Platonides ✍:
PS: You know you can subscribe to RSS feeds for the xml dumps, do you?
I was just already in the middle of scraping /backup-index.html to generate categorised Atom feeds. Where are the feeds you're speaking of?
<Filename>-rss.xml in the latest/ folder: http://download.wikimedia.org/kawiki/latest/kawiki-latest-pages-meta-history...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides:
Wouldn't be preferably to have a daemon on the toolserver downloading each new dump (or dumps marked as "someone is interested") ?
automatically download all dumps is going to use all available disk space very quickly.
- river.
i think there shoul be an text file with all the dumps the toolserver-developer need and a cron that will look up and refresh the dumps if avaible.
2008/7/14 DaB. WP@daniel.baur4.info:
Hello, Am Sonntag 13 Juli 2008 18:54:42 schrieb Platonides:
It would then place them on a common folder for any toolserver user.
there is allready such a place at
/mnt/user-store/dump
Sincerly, DaB.
-- wp-blog.de
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Hello !
If it can save time to somebody, I wrote that simple python script a few months ago : http://pastebin.com/fa3658d9
Initially, that snippet was using some GTK code, to prompt alerts on my laptop when a new dump was available, but... it got boring, particularly because it was, oh noes, prompting GTK alerts on every error -- and yes, not having a connection would prompt network errors. Stupid code kid :)
2008/7/13 Stefan Kühn kuehn-s@gmx.net:
Hello!
At the moment I and some other users of the dumps show every day/week/month at http://dumps.wikimedia.org/ for a new dump. I use the dump of 10 different languages for my projects (Wikipedia-World, Template-Tiger, Persondata), so it is really work.
I hope someone at this list can program a simple to use "Dump-Mail-Service". Maybe you have time, please do it! :-)
I want a mail
- for a list of language ( for example: de, en, fr and es)
- if the new dump is complete (all files)
- or if the file "pages-articles.xml.bz2" is complete
Thanks for your help!
Stefan Kühn http://de.wikipedia.org/wiki/User:Stefan_K%C3%BChn
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Stefan Kühn wrote:
Hello!
At the moment I and some other users of the dumps show every day/week/month at http://dumps.wikimedia.org/ for a new dump.
Please see the following shell script I am using to fetch "-latest-pages-articles.xml.bz2" from plwiki and plwikisource.
This script run on Solaris 5.9 machine.
Unfortunately, the dumps no longer bear a "Last-Modified" header, so currently wget is downloading them every day. Is there possibility to have those headers back?
#! /bin/sh
WIKIHOME=/home/wiki HTTP_PROXY=http://my.proxy:8080/ export HTTP_PROXY
[ "$#" = "0" ] && set plwikisource plwiki
[ -d "${WIKIHOME}/log" ] || mkdir -p "${WIKIHOME}/log" for DB in "$@" do WIKI_XML="${WIKIHOME}/${DB}-latest-pages-articles.xml" WIKI_BZIP="${WIKI_XML}.bz2" LOGFILE="${WIKIHOME}/log/${DB}-`date +%Y%m%d-%H%M.log`" /usr/local/bin/wget -o "${LOGFILE}" -P "${WIKIHOME}" -N \
http://download.wikimedia.org/$%7BDB%7D/latest/$%7BDB%7D-latest-pages-articl... if /usr/bin/test "${WIKI_BZIP}" -nt "${WIKI_XML}" then /usr/bin/bzip2 -dc "${WIKI_BZIP}" > "${WIKI_XML}" /usr/bin/touch -r "${WIKI_BZIP}" "${WIKI_XML}" else /usr/bin/rm "${LOGFILE}" fi done
Thanks for your help!
Stefan Kühn http://de.wikipedia.org/wiki/User:Stefan_K%C3%BChn
toolserver-l@lists.wikimedia.org