[Labs-l] Building our ZIM farm @wmflabs

Emmanuel Engelhart kelson at kiwix.org
Tue Mar 10 10:27:51 UTC 2015


Hi Andrew

On 09.03.2015 17:41, Andrew Bogott wrote:
> Erik M just pointed out that there is a similar effort towards this goal
> happening in production -- so maybe you can catch up with those folks
> and see what you can contribute, or if you can get what you need from
> their cluster?  I'm cc'ing Gabriel in hopes that he can collaborate with
> you or refer you to the folks who are doing the actual work.
>
> https://phabricator.wikimedia.org/T17017
> https://phabricator.wikimedia.org/T91853

Gabriel proposes to pack in a ZIP file the raw Parsoid output. This is 
HTML+RDF only and not directly usable by end-users. The purpose of this 
is to provide a high quality input for researchers and software 
developers interested in the article texts.

ZIM files with Kiwix is a solution directly usable for end-users. The 
HTML is heavily rewritten, unpacking content is not necessary, this 
include optimized pictures, a fulltext search index, you have the 
software to read it, etc.

Mwoffliner is based on Parsoid output, so it's possible to consider 
Gabriel's proposition as an intermediary step to create ZIM files. I 
consider generating Parsoid output tarball as a "middle product" for 
creating ZIM files and both should be done in one run.

That said, even if creating Parsoid dumps needs less effort than for ZIM 
files, both are stuck currently due to a lack of HW resources.

Regards
Emmanuel
-- 
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication



More information about the Labs-l mailing list