Hi
We have published a new version of MWoffliner: the MediaWiki scraper. Version 1.8.0 is - like always - available here: https://www.npmjs.com/package/mwoffliner
This new release contains big improvements in term of performance. MWoffliner 1.8 does not require anymore zimwriterfs binary and can write directly on-the-fly offline ZIM files. This means far less mass storage usage and a number of IO accesses divided by around 5.
Here is the detailed changelog: 1.8.0: * UPDATE: Write ZIM files directly (Using Libzim) #184 * UPDATE: Removed 'tmp' files and directory #448 #575 * UPDATE: Removed --deflateTmpHTML and --tmpDirectory arguments #575 #576 * UPDATE: Implemented better request backoff #496 * UPDATE: Change file names/paths #278 * UPDATE: Removed --writeHtmlRedirects argument #506 * UPDATE: Removed --localMCS option (automatically detect) #490 * UPDATE: Updated documentation #423 * FIX: Other stability, logging and error handling fixes
All of this has been made possible because a new software piece in openZIM portfolio: node-libzim. node-libzim is a JavaScript/NodeJS binding of our ZIM format reference library: libzim. It allows quickly to read/write ZIM files directly in JavaScript. The code is available in our code forge at https://github.com/openzim/node-libzim and of course available at npmjs.org: https://www.npmjs.com/package/@openzim/libzim.
This is the third milestone of a few we have planned with the support of the WMF. Next one on the list is 1.9 and is planned for end of April. With 1.9 we want to implement the full support of MediaWiki categories.
Like always, PR and bug reports are welcome at: https://github.com/openzim/mwoffliner
Regards Emmanuel