Hi Erik
This is great to see you speaking about this now.
Le 13/11/2013 06:51, Erik Moeller a écrit :
how important is ZIM support in Collections (the
"Create a book"
feature) on Wikimedia sites? We implemented this a while ago to
support offline efforts. Since collections are still typically very
much limited in size, it's not a very viable option for huge offline
exports, more for batches of articles on related topics. Do people
currently rely on this functionality for offline deployments?
Kiwix, as a project, does not rely directly on the WM ZIM export, but
many of our users do. Of course they suffer of the limitations of the
current solution and frankly: most of them are not aware of this feature.
So, this would be for us an impairment. But, I agree something should be
done. IMO we should somehow try to get 3 important output formats:
* PDF (adapted for really small collections)
* EPUB (the most used free ebook format)
* ZIM (for bigger collections)
We're re-implementing the rendering pipeline for
Collections to ensure
long-term maintainability, and our default would be to eliminate
initially all formats except for PDF if we don't absolutely have to
support them. I'll see if we can get some metrics on current ZIM file
usage via the Collection extension, but it'd be nice to get
qualitative feedback as well.
(More background at:
https://www.mediawiki.org/wiki/PDF_rendering )
I have also seen your email to wikiteck-l:
http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073059.html
I think the choice of Parsoid as a rendering backend is a really good
one for PDF. I have always been advocating the HTML2PDF approach. I also
think that Parsoid delivers the mandatory information to hack the HTML
correctly and adapt it for offline usage.
That's why I have been working since March on a solution called
mwoffliner (also using nodejs, like Parsoid):
https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.…
Mwoffliner:
1 - Download a selection of articles from the Parsoid API
2 - Rewrite the HTML code
3 - Write the ZIM file (not yet implemented, files are written on the
filesystem)
You can have a idea of the rendering with this whole WPRU collection
(ZIM file served with kiwix-serve):
http://library.kiwix.org/wikipedia_ru_all
If I correctly understand, points 1&2 are similar to what you plan to do
for the new PDF pipeline. So, this would be great to collaborate on this
and maintain the ZIM output. How does it sounds?
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web:
http://www.kiwix.org
* Twitter:
https://twitter.com/KiwixOffline
* more:
http://www.kiwix.org/wiki/Communication