Hello!
What script would you recommend to create a static offline version of a mediawiki? (Perhaps with and without parsoid?)
I've been looking for a good solution for ages, and have experimented with a few things. Here's what we currently do. It's not perfect, and really a bit too cumbersome, but it works as a proof of concept.
To illustrate: E.g. one of our wiki pages is here: http://orbit.educ.cam.ac.uk/wiki/OER4Schools/What_is_interactive_teaching
We have a "mirror" script, that uses the API to generate an HTML version of a wiki page (which is then 'wrapped' in a basic menu):
http://orbit.educ.cam.ac.uk/orbit_mirror/index.php?page=OER4Schools/What_is_...
(Some log info printed at the bottom of the page, which will provide some hints as to what is going on.)
The resulting page is as low-bandwidth as possible (which is one of our use cases). The original idea with the mirror php script was that you could run it on your own server: It only requests pages if they have changed, and keeps a cache, which allows viewing pages if your server has no connectivity. (You could of course use a cache anyway, and there's advantages/disadvantages compared to this more explicit caching method.) The script rewrites urls so that normal page links stay within the mirror, but links for editing and history point back at the wiki (see tabs along the top of the page).
The mirror script also produces (and caches) a static web page, see here: http://orbit.educ.cam.ac.uk/orbit_mirror/site/OER4Schools%252FHow_to_run_wor...
Assuming that you've run a wget across the mirror, then the site will be completely mirrored in '/site'. You can then tar up '/site' and distribute it alongside your w/images directory, and you have a static copy, or use rsync to incrementally update '/site' and w/images on another server.
There's also a api-based process, that can work out which pages have changes, and refreshes the mirror accordingly.
Most of what I am using is in the mediawiki software already (i.e. API->html), and it would be great to have a solution like this, that could generate an offline site on the fly. Perhaps one could add another export format to the API, and then an extension could generate the offline site and keep it up to date as pages on the main wiki are changing. Does this make sense? Would anybody be up for collaborating on implementing this? Are there better things in the pipeline?
I can see why you perhaps wouldn't want it for one of the major wikimedia sites, or why it might be inefficient somehow. But for our use cases, for a small-ish wiki, with a set of poorly connected users across the digital divide, it would be fantastic.
So - what are your solutions for creating a static offline copy of a mediawiki?
Looking forward to hearing about it! Bjoern