On 14.02.2015 08:03, Nitin Gupta wrote:
I hope HTML would be made available with the same
frequency as XML
(wikitext) dumps; it would save me yet another attempt to make wikitext
parser. Thanks.
For any API points you provide, it would be helpful if you could also
mention expected maximum load from a client (req/s), so client writers
can throttle accordingly.
Kiwix already publish full HTML snapshots packed in ZIM files (snapshots
with and without pictures). We publish monthly updates for most of
Wikimedia projects and are working to to it for all the projects:
http://www.kiwix.org
The solution is coded in Node.js and uses the Parsoid API:
https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/
We face recurring stability problems (with the 'http' module) which is
impairing the rollout for all project. If you are a Node.js expert your
help is really welcome.
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web:
http://www.kiwix.org
* Twitter:
https://twitter.com/KiwixOffline
* more:
http://www.kiwix.org/wiki/Communication