[Offline-l] Best script for static html of a mediawiki

14 Nov 2013


      Hello!
What script would you recommend to create a static offline version of
a mediawiki? (Perhaps with and without parsoid?)
I've been looking for a good solution for ages, and have experimented
with a few things. Here's what we currently do. It's not perfect, and
really a bit too cumbersome, but it works as a proof of concept.
To illustrate: E.g. one of our wiki pages is here:
http://orbit.educ.cam.ac.uk/wiki/OER4Schools/What_is_interactive_teaching
We have a "mirror" script, that uses the API to generate an HTML
version of a wiki page (which is then 'wrapped' in a basic menu):
http://orbit.educ.cam.ac.uk/orbit_mirror/index.php?page=OER4Schools/What_is_...
(Some log info printed at the bottom of the page, which will provide
some hints as to what is going on.)
The resulting page is as low-bandwidth as possible (which is one of
our use cases). The original idea with the mirror php script was that
you could run it on your own server: It only requests pages if they
have changed, and keeps a cache, which allows viewing pages if your
server has no connectivity. (You could of course use a cache anyway,
and there's advantages/disadvantages compared to this more explicit
caching method.) The script rewrites urls so that normal page links
stay within the mirror, but links for editing and history point back
at the wiki (see tabs along the top of the page).
The mirror script also produces (and caches) a static web page, see here:
http://orbit.educ.cam.ac.uk/orbit_mirror/site/OER4Schools%252FHow_to_run_wor...
Assuming that you've run a wget across the mirror, then the site will
be completely mirrored in '/site'. You can then tar up '/site' and
distribute it alongside your w/images directory, and you have a static
copy, or use rsync to incrementally update '/site' and w/images on
another server.
There's also a api-based process, that can work out which pages have
changes, and refreshes the mirror accordingly.
Most of what I am using is in the mediawiki software already (i.e.
API->html), and it would be great to have a solution like this, that
could generate an offline site on the fly. Perhaps one could add
another export format to the API, and then an extension could generate
the offline site and keep it up to date as pages on the main wiki are
changing. Does this make sense? Would anybody be up for collaborating
on implementing this? Are there better things in the pipeline?
I can see why you perhaps wouldn't want it for one of the major
wikimedia sites, or why it might be inefficient somehow. But for our
use cases, for a small-ish wiki, with a set of poorly connected users
across the digital divide, it would be fantastic.
So - what are your solutions for creating a static offline copy of a mediawiki?
Looking forward to hearing about it!
Bjoern

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[Offline-l] Best script for static html of a mediawiki