We rarely write here specifically about mwoffliner, even if this tool is
time to time named by-the-way in threads. But these last months we have
done many interesting improvements to this important tool and I thought
it might be valuable to report quickly about them.
As a reminder, mwoffliner is a script which is thought to build a ZIM
file from any (recent) online Mediawiki. It scraps a snapshot of the
online wiki (HTML/JS/pictures/...) on your local disk.
Here is the list of recent improvements:
* We have introduce Parsoid as a local dependence, which means that even
if a Mediawiki does not have Parsoid/Visual Editor installed, mwoffliner
should have a chance now to build the ZIM file of it by running Parsoid
* We have introduced the Parsoid mobile layout suppport which allows to
build ZIM file with a similar layout as Wikipedia Mobile version. This
is pretty much in beta and we plan first to use it only for Wikipedia.org.
* We have introduced the support of audio/video which means that now,
like the pictures, they are mirrored too. Our first tests show that for
Wikipedia it tends to multiply the size of the ZIM file by a factor
four. As a consequence we won't use it directly everywhere. That said
the feature is there and we will step-by-step introduce video in the ZIM
files we are generating with mwoffliner.
* We have published mwoffliner (and mwmatrixoffliner) to the npmjs
repository: https://www.npmjs.com/package/mwoffliner. Now everybody can
install it easily (but you still need to take care about the dependences).
* We have made the script a bit more modular: you can call it like any
other program but now you can also use it as a library in your own
* We have moved the git repository to the openZIM organization on
Github: https://github.com/openzim/mwoffliner. By moving all our scraper
to the openZIM organization we hope to bring a bit of clarity between
Kiwix and openZIM respective duties. Have a look to all other scrapers
we have migrated to openZIM: https://github.com/openzim
mwoffliner is not a tool for everybody but it is really important to
continue to improve it to provide quality ZIM files of Wikipedia,
to prepare the next big steps forward
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
We're holding a Hackathon on August 13-18. right after Wikimania, in Potsdam, New York (about 2.5 hours drive from Montreal). The focus is on writing code to help produce offline collections of Wikipedia content along with other medical & educational materials, mostly for use in libraries, schools and clinics. The main wiki page can be found at http://OFF.NETWORK. We can probably help you with transportation from Montreal, and accommodation in Potsdam if you need it.
Some of the attendees will be there from familiar offline groups such as Kiwix mainly to write code. Others focus on end-use and are from the non-profit/educational community, such as Computers for Kids, Internet-in-a-box, KA Lite. These people will want to ensure that the technical needs of the groups can be met, and that resources are shared. You can see the sort of thing Internet-in-a-Box does in this recent article on opensource.com: https://opensource.com/article/17/5/internet-in-a-box-raspberry-pi
If you're interested in attending, it's probably best to contact me directly. Thanks!
Martin A. Walker (walkerma on Wikipedia)
Professor of Chemistry, SUNY Potsdam