Hi
We rarely write here specifically about mwoffliner, even if this tool is
time to time named by-the-way in threads. But these last months we have
done many interesting improvements to this important tool and I thought
it might be valuable to report quickly about them.
As a reminder, mwoffliner is a script which is thought to build a ZIM
file from any (recent) online Mediawiki. It scraps a snapshot of the
online wiki (HTML/JS/pictures/...) on your local disk.
Here is the list of recent improvements:
* We have introduce Parsoid as a local dependence, which means that even
if a Mediawiki does not have Parsoid/Visual Editor installed, mwoffliner
should have a chance now to build the ZIM file of it by running Parsoid
locally.
* We have introduced the Parsoid mobile layout suppport which allows to
build ZIM file with a similar layout as Wikipedia Mobile version. This
is pretty much in beta and we plan first to use it only for Wikipedia.org.
* We have introduced the support of audio/video which means that now,
like the pictures, they are mirrored too. Our first tests show that for
Wikipedia it tends to multiply the size of the ZIM file by a factor
four. As a consequence we won't use it directly everywhere. That said
the feature is there and we will step-by-step introduce video in the ZIM
files we are generating with mwoffliner.
* We have published mwoffliner (and mwmatrixoffliner) to the npmjs
repository: https://www.npmjs.com/package/mwoffliner. Now everybody can
install it easily (but you still need to take care about the dependences).
* We have made the script a bit more modular: you can call it like any
other program but now you can also use it as a library in your own
Javascript/Node.js scripts.
* We have moved the git repository to the openZIM organization on
Github: https://github.com/openzim/mwoffliner. By moving all our scraper
to the openZIM organization we hope to bring a bit of clarity between
Kiwix and openZIM respective duties. Have a look to all other scrapers
we have migrated to openZIM: https://github.com/openzim
mwoffliner is not a tool for everybody but it is really important to
continue to improve it to provide quality ZIM files of Wikipedia,
Wiktionary, ... So if you have Javascript skills please come to help us
to prepare the next big steps forward
https://github.com/openzim/mwoffliner/issues
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
Hi
After a year a half of efforts, we are proud to announce that we have
done our first delivering of ZIM files of the Stack Exchange web sites.
All the ZIM files are freely available to download via the Kiwix
software or directly on the Kiwix download server:
http://download.kiwix.org/zim/stack_exchange/
Stack Exchange is a network of question-and-answer websites on topics in
varied fields, each site covering a specific topic, where questions,
answers, and users are subject to a reputation award process. This
include famous web sites like Stackoverflow.com, AskUbuntu.com or
Superuser.com. More information about Stack Exchange and its more than
100 web sites is available here:
https://en.wikipedia.org/wiki/Stack_Exchange.
This ZIM files are done thanks to regularly updated archives provided on
archives.org and an ad-hoc software our team has specially developed for
that purpose. This software is called "Sotoki" and his of course open
source. You can have a look to the source code here
https://github.com/openzim/sotoki or use it directly using Python pip
packager:
https://pypi.python.org/pypi/sotoki
We plan to release updates of these ZIM files each time new archives
will be published.
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
We're holding a Hackathon on August 13-18. right after Wikimania, in Potsdam, New York (about 2.5 hours drive from Montreal). The focus is on writing code to help produce offline collections of Wikipedia content along with other medical & educational materials, mostly for use in libraries, schools and clinics. The main wiki page can be found at http://OFF.NETWORK. We can probably help you with transportation from Montreal, and accommodation in Potsdam if you need it.
Some of the attendees will be there from familiar offline groups such as Kiwix mainly to write code. Others focus on end-use and are from the non-profit/educational community, such as Computers for Kids, Internet-in-a-box, KA Lite. These people will want to ensure that the technical needs of the groups can be met, and that resources are shared. You can see the sort of thing Internet-in-a-Box does in this recent article on opensource.com: https://opensource.com/article/17/5/internet-in-a-box-raspberry-pi
If you're interested in attending, it's probably best to contact me directly. Thanks!
Martin A. Walker (walkerma on Wikipedia)
Professor of Chemistry, SUNY Potsdam
walkerma(a)potsdam.edu<mailto:walkerma@potsdam.edu>