[Toolserver-l] Dump-Mail-Service

Marcin Cieslak saper at system.pl
Mon Jul 14 18:47:49 UTC 2008


Stefan Kühn wrote:
> Hello!
> 
> At the moment I and some other users of the dumps show every 
> day/week/month at http://dumps.wikimedia.org/ for a new dump. 

Please see the following shell script I am using to fetch 
"-latest-pages-articles.xml.bz2" from plwiki and plwikisource.

This script run on Solaris 5.9 machine.

Unfortunately, the dumps no longer bear a "Last-Modified" header, so 
currently wget is downloading them every day. Is there possibility to 
have those headers back?

#! /bin/sh

WIKIHOME=/home/wiki
HTTP_PROXY=http://my.proxy:8080/
export HTTP_PROXY

[ "$#" = "0" ] && set plwikisource plwiki

[ -d "${WIKIHOME}/log" ] || mkdir -p "${WIKIHOME}/log"
for DB in "$@"
do
         WIKI_XML="${WIKIHOME}/${DB}-latest-pages-articles.xml"
         WIKI_BZIP="${WIKI_XML}.bz2"
         LOGFILE="${WIKIHOME}/log/${DB}-`date +%Y%m%d-%H%M.log`"
         /usr/local/bin/wget -o "${LOGFILE}" -P "${WIKIHOME}" -N \
 
http://download.wikimedia.org/${DB}/latest/${DB}-latest-pages-articles.xml.bz2
         if /usr/bin/test "${WIKI_BZIP}" -nt "${WIKI_XML}"
         then
             /usr/bin/bzip2 -dc "${WIKI_BZIP}" > "${WIKI_XML}"
             /usr/bin/touch -r  "${WIKI_BZIP}" "${WIKI_XML}"
         else
             /usr/bin/rm "${LOGFILE}"
         fi
done


> Thanks for your help!
> 
> Stefan Kühn
> http://de.wikipedia.org/wiki/User:Stefan_K%C3%BChn
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 273 bytes
Desc: OpenPGP digital signature
Url : http://lists.wikimedia.org/pipermail/toolserver-l/attachments/20080714/924521b7/attachment.pgp 


More information about the Toolserver-l mailing list