On 27 Oct 2005, at 06:47, mediawiki-l-request@Wikimedia.org wrote:
From: Sebastian Albrecht albrecht@fielax.de
I've spend quite a time searching for a possibility to export articles from the mediawiki to a static HTML tree or sth.
Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!
:::: Our enemies are innovative and resourceful, and so are we. They never stop thinking about new ways to harm our country and our people, and neither do we. -- George W. Bush :::: Jan Steinman http://www.Bytesmiths.com/Events
Hello,
Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!
Me too ;) This is what I've done so far and it's ok for me:
It is a sh script you can copy in a file wiki2html. Make it executable using chmod and execute it.
It will get html files from a wiki using wget and it will try to get some other files like f.e. main.css or the logo. It will also use SED to replace absolute paths (/wiki/skins/...) inside css or javascript elements of downloaded html pages. This is what wget won't do and what will make the whole thing look a little better (than the printable format).
Please note this script is quite specific for my personal wiki and you should have a look at it for using it yourself. The rejected strings of the wget command can be certainly optimized. DON'T try to use it at wikipedia because this will not even kill their servers but your client.
Best regards, Sebastian
#!/bin/sh ###################################################### # # WIKI Export script - Wgets a wiki to static html. # ######################################################
# Check input
if [ "$2" = "" ] ; then
echo " $0 - Wgets a wiki to static html, 10/2005
This script does a wget to retrieve static html pages from a wiki. Several wiki typical pages are excluded because they are unimportant for offline usage (edit, history and special pages). URLs in the html pages are changed automatically so you can browse the static wiki offline.
Usage: $0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]
Examples: $0 http://url/wiki ./wiki $0 http://url/wiki ./wiki 3
Requires: sed, wget " exit 1 fi
# Define input variables
URL=$1 DEST_DIR=$2 DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:////g'` REC_LEVEL=$3
if [ "$3" = "" -o "$3" -le "0" ] ; then REC_LEVEL=2 fi
# WGET pages recursively
echo "
Getting wiki pages to static html... URL: $URL Destination: $DEST_DIR
"
wget \ -nv \ --convert-links \ --page-requisites \ --html-extension \ --recursive \ --level=$REC_LEVEL \ --directory-prefix=$DEST_DIR \ --reject "*edit*,*history*,*Spezial*,*oldid*" \ $URL
# Get main.css for having a nicer static wiki
echo "
Trying to get some files for more beauty (main.css, logo.png)...
"
wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/monobook/main.css
wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/common/images/wiki.png
# Find and replace absolute wiki css paths in static pages
echo "
Replacing absolute wiki paths...
"
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do sed 's//wiki/skin/skin/g' $FILE > $FILE.new; done;
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do mv $FILE.new $FILE ; done;
# Try copying index file
echo "
Trying to copy index?index=Hauptseite.html to index.html to have an easier entrance..."
cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html
# DONE
echo "
FINISHED! Look for the results at $DEST_DIR_COMPLETE Your browser might be able to load following URL: file://$PWD/$DEST_DIR_COMPLETE/
"
What's wrong with using dumpHTML.php?
On 10/27/05, Sebastian Albrecht albrecht@fielax.de wrote:
Hello,
Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!
Me too ;) This is what I've done so far and it's ok for me:
It is a sh script you can copy in a file wiki2html. Make it executable using chmod and execute it.
It will get html files from a wiki using wget and it will try to get some other files like f.e. main.css or the logo. It will also use SED to replace absolute paths (/wiki/skins/...) inside css or javascript elements of downloaded html pages. This is what wget won't do and what will make the whole thing look a little better (than the printable format).
Please note this script is quite specific for my personal wiki and you should have a look at it for using it yourself. The rejected strings of the wget command can be certainly optimized. DON'T try to use it at wikipedia because this will not even kill their servers but your client.
Best regards, Sebastian
#!/bin/sh ###################################################### # # WIKI Export script - Wgets a wiki to static html. # ######################################################
# Check input
if [ "$2" = "" ] ; then
echo " $0 - Wgets a wiki to static html, 10/2005
This script does a wget to retrieve static html pages from a wiki. Several wiki typical pages are excluded because they are unimportant for offline usage (edit, history and special pages). URLs in the html pages are changed automatically so you can browse the static wiki offline.
Usage: $0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]
Examples: $0 http://url/wiki ./wiki $0 http://url/wiki ./wiki 3
Requires: sed, wget " exit 1 fi
# Define input variables
URL=$1 DEST_DIR=$2 DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:////g'` REC_LEVEL=$3
if [ "$3" = "" -o "$3" -le "0" ] ; then REC_LEVEL=2 fi
# WGET pages recursively
echo "
Getting wiki pages to static html... URL: $URL Destination: $DEST_DIR
"
wget \ -nv \ --convert-links \ --page-requisites \ --html-extension \ --recursive \ --level=$REC_LEVEL \ --directory-prefix=$DEST_DIR \ --reject "*edit*,*history*,*Spezial*,*oldid*" \ $URL
# Get main.css for having a nicer static wiki
echo "
Trying to get some files for more beauty (main.css, logo.png)...
"
wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/monobook/main.css
wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/common/images/wiki.png
# Find and replace absolute wiki css paths in static pages
echo "
Replacing absolute wiki paths...
"
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do sed 's//wiki/skin/skin/g' $FILE > $FILE.new; done;
for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do mv $FILE.new $FILE ; done;
# Try copying index file
echo "
Trying to copy index?index=Hauptseite.html to index.html to have an easier entrance..."
cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html
# DONE
echo "
FINISHED! Look for the results at $DEST_DIR_COMPLETE Your browser might be able to load following URL: file://$PWD/$DEST_DIR_COMPLETE/
" _______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
On 10/27/05, Sebastian Albrecht albrecht@fielax.de wrote:
Hi Anthony,
What's wrong with using dumpHTML.php?
Sorry I don't have a dumpHTML.php in my mediawiki folder. Is it in a new version? I use 1.4.7.
Sebastian
I've never used it. I just saw it mentioned on http://static.wikipedia.org/. Tim Starling would be the right one to ask for more info, but unless I'm misunderstanding the problem it seems like it does exactly what is being requested.
Anthony
Anthony DiPierro wrote:
Sorry I don't have a dumpHTML.php in my mediawiki folder. Is it in a new
found it in the maintenance directory (1.5 version)
must be run locally on the server
did nothing really nice for me
jdd
The whole WWW allows underscores in usernames. Is there anyway to get around this restriction in Mediawiki ?
thank you
Eric
--------------------------------- Yahoo! FareChase - Search multiple travel sites in one click.
Eric K wrote:
The whole WWW allows underscores in usernames. Is there anyway to get around this restriction in Mediawiki ?
User names on a wiki are a special case of page names. Underscore is reserved for space in page titles for backwards compatibility.
-- brion vibber (brion @ pobox.com)
mediawiki-l@lists.wikimedia.org