hi there, and sorry if this is a FAQ, but i couldn't find a clear answer easily.
i was playing around with wiki2static.pl a bit last night, and heard a rumour that it's not supported anymore. additionaly, the HTML it produces is a bit sub-optimal, and it seems it only understands some older wiki-syntax. soi was wondering if anybody already tried to isolate whichever part of wikimedia actually 'renders' the 'cur' column of the database to HTML, and make a stand-alone app (or a perl module, or something) based on that in a way that re-uses the latest wikimedia code, so that any scripts that trry to create a static version of a wiki could easily be kept in sync with the lates mediawiki syntax.
so: does anyone know whether this already exists, is feasible to do or complete nonsense?
cheers,
Moritz von Schweinitz (who didn't dig into the mediawii source to answer this)
Moritz von Schweinitz wrote:
soi was wondering if anybody already tried to isolate whichever part of wikimedia actually 'renders' the 'cur' column of the database to HTML, and make a stand-alone app (or a perl module, or something) based on that in a way that re-uses the latest wikimedia code, so that any scripts that trry to create a static version of a wiki could easily be kept in sync with the lates mediawiki syntax.
Trying to create a separate program from scratch seems a bit silly, I'm not sure why people keep doing it over and over when it's fairly trivial to make a maintenance script for MediaWiki that runs the parser on each page and dumps the output.
Tim's started a script for this, it's in the maintenance directory in CVS. This is the development version of MediaWiki and is not directly usable yet on current page download dumps as the database format has changed.
-- brion vibber (brion @ pobox.com)
On 4/13/05, Brion Vibber brion@pobox.com wrote:
Trying to create a separate program from scratch seems a bit silly, I'm not sure why people keep doing it over and over when it's fairly trivial to make a maintenance script for MediaWiki that runs the parser on each page and dumps the output.
For some of the many attempts people *have* made, see http://meta.wikimedia.org/wiki/Alternative_parsers
But yeah, for just dumping, I guess command-line access to the "real thing" makes a lot more sense.
Brion Vibber wrote:
Tim's started a script for this, it's in the maintenance directory in CVS. This is the development version of MediaWiki and is not directly usable yet on current page download dumps as the database format has changed.
I guess I'd better say a few words about it, since the topic keeps coming up. The script produces quite nice HTML dumps, with HTML files distributed between directories identified by the first two bytes. The link URLs are rewritten appropriately. This is useful for English wikis since it allows you to guess the URL, but it doesn't work so well for languages with a different orthography. Doing it by character rather than byte could be useful, however that would give a much broader tree, especially for the CJK languages.
It's currently clumsy to use, requiring you to move HTMLDump.php from skins/disabled/ to skins/, which unfortunately enables the skin in the user interface. Any user who changed their skin to it would find that there are no user preference links allowing them to get back (the interface is greatly stripped down). We need to make a distinction between valid skins and skins available for users.
It rewrites stylesheet and image URLs to relative URLs in hard-coded paths (../../../images and ../../../skins). This needs to be made more flexible. It doesn't currently rewrite URLs for images from commons, or provide a way to package these images.
So it's a start, but there's still plenty of things to do. Luckily, porting the parser to a different language isn't one of them. I'm not working on it at the moment, so I won't mind if someone picks it up where I left off.
-- Tim Starling
wikitech-l@lists.wikimedia.org