[Labs-l] Building our ZIM farm @wmflabs
Emmanuel Engelhart
kelson at kiwix.org
Tue Mar 10 10:27:51 UTC 2015
Hi Andrew
On 09.03.2015 17:41, Andrew Bogott wrote:
> Erik M just pointed out that there is a similar effort towards this goal
> happening in production -- so maybe you can catch up with those folks
> and see what you can contribute, or if you can get what you need from
> their cluster? I'm cc'ing Gabriel in hopes that he can collaborate with
> you or refer you to the folks who are doing the actual work.
>
> https://phabricator.wikimedia.org/T17017
> https://phabricator.wikimedia.org/T91853
Gabriel proposes to pack in a ZIP file the raw Parsoid output. This is
HTML+RDF only and not directly usable by end-users. The purpose of this
is to provide a high quality input for researchers and software
developers interested in the article texts.
ZIM files with Kiwix is a solution directly usable for end-users. The
HTML is heavily rewritten, unpacking content is not necessary, this
include optimized pictures, a fulltext search index, you have the
software to read it, etc.
Mwoffliner is based on Parsoid output, so it's possible to consider
Gabriel's proposition as an intermediary step to create ZIM files. I
consider generating Parsoid output tarball as a "middle product" for
creating ZIM files and both should be done in one run.
That said, even if creating Parsoid dumps needs less effort than for ZIM
files, both are stuck currently due to a lack of HW resources.
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
More information about the Labs-l
mailing list