On Wed, Sep 16, 2015 at 12:51 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Have you looked into what mwoffliner does? https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.j...
+1 for mwoffliner. It should be *very* close to what you are looking for, and avoids the need to parse wikitext itself by fetching the HTML from the Wikimedia REST API.
Maybe you can even just extract the images from the ZIM files.
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l