Re: Export mediawiki articles to static HTML or

List overview All Threads
Download

newer

older

thank you moderator- WikiPolitics...

default page values?

Jan Steinman

27 Oct 2005 27 Oct '05

11:17 p.m.

On 27 Oct 2005, at 06:47, mediawiki-l-request@Wikimedia.org wrote:

...

From: Sebastian Albrecht albrecht@fielax.de

I've spend quite a time searching for a possibility to export articles from the mediawiki to a static HTML tree or sth.

Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!

:::: Our enemies are innovative and resourceful, and so are we. They never stop thinking about new ways to harm our country and our people, and neither do we. -- George W. Bush :::: Jan Steinman http://www.Bytesmiths.com/Events

Show replies by date

Sebastian Albrecht

28 Oct 28 Oct

3:30 a.m.

New subject: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

Hello,

...

Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!

Me too ;) This is what I've done so far and it's ok for me:

It is a sh script you can copy in a file wiki2html. Make it executable using chmod and execute it.

It will get html files from a wiki using wget and it will try to get some other files like f.e. main.css or the logo. It will also use SED to replace absolute paths (/wiki/skins/...) inside css or javascript elements of downloaded html pages. This is what wget won't do and what will make the whole thing look a little better (than the printable format).

Please note this script is quite specific for my personal wiki and you should have a look at it for using it yourself. The rejected strings of the wget command can be certainly optimized. DON'T try to use it at wikipedia because this will not even kill their servers but your client.

Best regards, Sebastian

#!/bin/sh ###################################################### # # WIKI Export script - Wgets a wiki to static html. # ######################################################

# Check input

if [ "$2" = "" ] ; then

echo " $0 - Wgets a wiki to static html, 10/2005

This script does a wget to retrieve static html pages from a wiki. Several wiki typical pages are excluded because they are unimportant for offline usage (edit, history and special pages). URLs in the html pages are changed automatically so you can browse the static wiki offline.

Usage: $0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]

Examples: $0 http://url/wiki ./wiki $0 http://url/wiki ./wiki 3

Requires: sed, wget " exit 1 fi

# Define input variables

URL=$1 DEST_DIR=$2 DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:////g'` REC_LEVEL=$3

if [ "$3" = "" -o "$3" -le "0" ] ; then REC_LEVEL=2 fi

# WGET pages recursively

echo "

...

...
Getting wiki pages to static html... URL: $URL Destination: $DEST_DIR

wget \ -nv \ --convert-links \ --page-requisites \ --html-extension \ --recursive \ --level=$REC_LEVEL \ --directory-prefix=$DEST_DIR \ --reject "*edit*,*history*,*Spezial*,*oldid*" \ $URL

# Get main.css for having a nicer static wiki

echo "

...

...
Trying to get some files for more beauty (main.css, logo.png)...

wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/monobook/main.css

wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/common/images/wiki.png

# Find and replace absolute wiki css paths in static pages

echo "

...

...
Replacing absolute wiki paths...

for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do sed 's//wiki/skin/skin/g' $FILE > $FILE.new; done;

for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do mv $FILE.new $FILE ; done;

# Try copying index file

echo "

...

...
Trying to copy index?index=Hauptseite.html to index.html to have an easier entrance..."

cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html

# DONE

echo "

...

...
FINISHED! Look for the results at $DEST_DIR_COMPLETE Your browser might be able to load following URL: file://$PWD/$DEST_DIR_COMPLETE/

Anthony DiPierro

3:42 a.m.

New subject: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

What's wrong with using dumpHTML.php?

On 10/27/05, Sebastian Albrecht albrecht@fielax.de wrote:

...

Hello,

...
Have you tried wget? Should work under Linux. Sorry, I Don't Do Windows(tm)!

Me too ;) This is what I've done so far and it's ok for me:

It is a sh script you can copy in a file wiki2html. Make it executable using chmod and execute it.

It will get html files from a wiki using wget and it will try to get some other files like f.e. main.css or the logo. It will also use SED to replace absolute paths (/wiki/skins/...) inside css or javascript elements of downloaded html pages. This is what wget won't do and what will make the whole thing look a little better (than the printable format).

Please note this script is quite specific for my personal wiki and you should have a look at it for using it yourself. The rejected strings of the wget command can be certainly optimized. DON'T try to use it at wikipedia because this will not even kill their servers but your client.

Best regards, Sebastian

#!/bin/sh ###################################################### # # WIKI Export script - Wgets a wiki to static html. # ######################################################

# Check input

if [ "$2" = "" ] ; then

echo " $0 - Wgets a wiki to static html, 10/2005

This script does a wget to retrieve static html pages from a wiki. Several wiki typical pages are excluded because they are unimportant for offline usage (edit, history and special pages). URLs in the html pages are changed automatically so you can browse the static wiki offline.

Usage: $0 <URL_to_wiki> <destination_dir> [<recursive_depth> default=2]

Examples: $0 http://url/wiki ./wiki $0 http://url/wiki ./wiki 3

Requires: sed, wget " exit 1 fi

# Define input variables

URL=$1 DEST_DIR=$2 DEST_DIR_COMPLETE=$DEST_DIR/`echo "$URL" | sed 's/[a-zA-Z]*:////g'` REC_LEVEL=$3

if [ "$3" = "" -o "$3" -le "0" ] ; then REC_LEVEL=2 fi

# WGET pages recursively

echo "

...
...
Getting wiki pages to static html... URL: $URL Destination: $DEST_DIR

"

wget \ -nv \ --convert-links \ --page-requisites \ --html-extension \ --recursive \ --level=$REC_LEVEL \ --directory-prefix=$DEST_DIR \ --reject "*edit*,*history*,*Spezial*,*oldid*" \ $URL

# Get main.css for having a nicer static wiki

echo "

...
...
Trying to get some files for more beauty (main.css, logo.png)...

"

wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/monobook/main.css

wget \ -nv \ --directory-prefix=$DEST_DIR \ --recursive \ --level=1 \ $URL/skins/common/images/wiki.png

# Find and replace absolute wiki css paths in static pages

echo "

...
...
Replacing absolute wiki paths...

"

for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do sed 's//wiki/skin/skin/g' $FILE > $FILE.new; done;

for FILE in `ls $DEST_DIR_COMPLETE/*.html` ; do mv $FILE.new $FILE ; done;

# Try copying index file

echo "

...
...
Trying to copy index?index=Hauptseite.html to index.html to have an easier entrance..."

cp $DEST_DIR_COMPLETE/*Hauptseite.html $DEST_DIR_COMPLETE/index.html

# DONE

echo "

...
...
FINISHED! Look for the results at $DEST_DIR_COMPLETE Your browser might be able to load following URL: file://$PWD/$DEST_DIR_COMPLETE/

" _______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

Sebastian Albrecht

3:54 a.m.

New subject: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

Hi Anthony,

...

What's wrong with using dumpHTML.php?

Sorry I don't have a dumpHTML.php in my mediawiki folder. Is it in a new version? I use 1.4.7.

Sebastian

Anthony DiPierro

4:01 a.m.

New subject: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

On 10/27/05, Sebastian Albrecht albrecht@fielax.de wrote:

...

Hi Anthony,

...
What's wrong with using dumpHTML.php?

Sorry I don't have a dumpHTML.php in my mediawiki folder. Is it in a new version? I use 1.4.7.

Sebastian

I've never used it. I just saw it mentioned on http://static.wikipedia.org/. Tim Starling would be the right one to ask for more info, but unless I'm misunderstanding the problem it seems like it does exactly what is being requested.

Anthony

jdd

5 p.m.

New subject: [Mediawiki-l] Re: Export mediawiki articles to static HTML or

Anthony DiPierro wrote:

...

...
Sorry I don't have a dumpHTML.php in my mediawiki folder. Is it in a new

found it in the maintenance directory (1.5 version)

must be run locally on the server

did nothing really nice for me

jdd

-- pour m'écrire, aller sur: http://www.dodin.net

Eric K

29 Oct 29 Oct

7:24 a.m.

New subject: [Mediawiki-l] Underscores in usernames have rights too!

The whole WWW allows underscores in usernames. Is there anyway to get around this restriction in Mediawiki ?

thank you

Eric

--------------------------------- Yahoo! FareChase - Search multiple travel sites in one click.

Brion Vibber

11:20 a.m.

New subject: [Mediawiki-l] Underscores in usernames have rights too!

Eric K wrote:

...

The whole WWW allows underscores in usernames. Is there anyway to get around this restriction in Mediawiki ?

User names on a wiki are a special case of page names. Underscore is reserved for space in page titles for backwards compatibility.

-- brion vibber (brion @ pobox.com)

6836

Age (days ago)

6838

Last active (days ago)

mediawiki-l@lists.wikimedia.org

7 comments

6 participants

tags (0)

participants (6)

Anthony DiPierro
Brion Vibber
Eric K
Jan Steinman
jdd
Sebastian Albrecht