Re: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

27 Oct 2005


      Hello,
...
Have you tried wget? Should work under Linux. Sorry, I Don't Do  
Windows(tm)!
Me too ;)
This is what I've done so far and it's ok for me:
It is a sh script you can copy in a file wiki2html. Make it executable 
using chmod and execute it.
It will get html files from a wiki using wget and it will try to get 
some other files like f.e. main.css or the logo. It will also use SED to 
replace absolute paths (/wiki/skins/...) inside css or javascript 
elements of downloaded html pages. This is what wget won't do and what 
will make the whole thing look a little better (than the printable format).
Please note this script is quite specific for my personal wiki and you 
should have a look at it for using it yourself. The rejected strings of 
the wget command can be certainly optimized.
DON'T try to use it at wikipedia because this will not even kill their 
servers but your client.
Best regards,
Sebastian
#!/bin/sh
######################################################
#
# WIKI Export script - Wgets a wiki to static html.
#
######################################################
# Check input
if [ "$2" = "" ] ; then
echo "
    $0 - Wgets a wiki to static html, 10/2005
This script does a wget to retrieve static html pages from a wiki.
    Several wiki typical pages are excluded because they are unimportant
    for offline usage (edit, history and special pages).
    URLs in the html pages are changed automatically so you can
    browse the static wiki offline.
Usage:
    $0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]
Examples:
    $0 http://url/wiki ./wiki
    $0 http://url/wiki ./wiki 3
Requires:
    sed, wget
    "
    exit 1
fi
# Define input variables
URL=$1
DEST_DIR=$2
DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:////g'`
REC_LEVEL=$3
if [ "$3" = "" -o "$3" -le "0" ] ; then
    REC_LEVEL=2
fi
# WGET pages recursively
echo "
...
...
Getting wiki pages to static html...
URL:         $URL
Destination: $DEST_DIR
"
wget \
-nv \
--convert-links \
--page-requisites \
--html-extension \
--recursive \
--level=$REC_LEVEL \
--directory-prefix=$DEST_DIR \
--reject "*edit*,*history*,*Spezial*,*oldid*" \
$URL
# Get main.css for having a nicer static wiki
echo "
...
...
Trying to get some files for more beauty (main.css, logo.png)...
"
wget \
-nv \
--directory-prefix=$DEST_DIR \
--recursive \
--level=1 \
$URL/skins/monobook/main.css
wget \
-nv \
--directory-prefix=$DEST_DIR \
--recursive \
--level=1 \
$URL/skins/common/images/wiki.png
# Find and replace absolute wiki css paths in static pages
echo "
...
...
Replacing absolute wiki paths...
"
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do
    sed 's//wiki/skin/skin/g' $FILE > $FILE.new;
done;
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do
    mv $FILE.new $FILE ;
done;
# Try copying index file
echo "
...
...
Trying to copy index?index=Hauptseite.html to index.html to have
an easier entrance..."
cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html
# DONE
echo "
...
...
FINISHED! Look for the results at $DEST_DIR_COMPLETE
Your browser might be able to load following URL:
   file://$PWD/$DEST_DIR_COMPLETE/
"

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Mediawiki-l] Re: Export mediawiki articles to static HTML or