Stefan Kühn wrote:
Hello!
At the moment I and some other users of the dumps show every day/week/month at http://dumps.wikimedia.org/ for a new dump.
Please see the following shell script I am using to fetch "-latest-pages-articles.xml.bz2" from plwiki and plwikisource.
This script run on Solaris 5.9 machine.
Unfortunately, the dumps no longer bear a "Last-Modified" header, so currently wget is downloading them every day. Is there possibility to have those headers back?
#! /bin/sh
WIKIHOME=/home/wiki HTTP_PROXY=http://my.proxy:8080/ export HTTP_PROXY
[ "$#" = "0" ] && set plwikisource plwiki
[ -d "${WIKIHOME}/log" ] || mkdir -p "${WIKIHOME}/log" for DB in "$@" do WIKI_XML="${WIKIHOME}/${DB}-latest-pages-articles.xml" WIKI_BZIP="${WIKI_XML}.bz2" LOGFILE="${WIKIHOME}/log/${DB}-`date +%Y%m%d-%H%M.log`" /usr/local/bin/wget -o "${LOGFILE}" -P "${WIKIHOME}" -N \
http://download.wikimedia.org/$%7BDB%7D/latest/$%7BDB%7D-latest-pages-articl... if /usr/bin/test "${WIKI_BZIP}" -nt "${WIKI_XML}" then /usr/bin/bzip2 -dc "${WIKI_BZIP}" > "${WIKI_XML}" /usr/bin/touch -r "${WIKI_BZIP}" "${WIKI_XML}" else /usr/bin/rm "${LOGFILE}" fi done
Thanks for your help!
Stefan Kühn http://de.wikipedia.org/wiki/User:Stefan_K%C3%BChn