Hi,
at Wikimania they were a few discussions about the pertinence to push ZIM, especially if you think EPUB is already really good supported, it's also open, it's also compressed, etc. I made (again) a small benchmark to try to convince the last people who are skeptical.
Here is an example with Simple English Wikipedia without pictures: * Raw content 125342 HTML pages 1.4 GB * ZIM 93 MB Access time of article "Wikipedia" (HTML only) = 0.012s * ZIP 331 MB Access time of article "Wikipedia" (HTML only) = 0.035s
Additional infos: * ZIP random access time is proportional to the count of files, with 3 more HTML files, accessing the same content takes 0.113s (so almost x3) * Please keep in mind that the test were ran on a Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz * They are no images here... which are often bigger and which would be re-compressed unnecessarily and so bigger in a ZIM * Articles with 20-30 images are common. * Benchmark was made with not so much files... The French Wikipedia has many millions of files
I let you take the conclusion by yourself ;)
Regards Emmanuel
ZIM rocks!
Asaf
On Sat, Aug 6, 2011 at 6:42 AM, Emmanuel Engelhart emmanuel@engelhart.orgwrote:
Hi,
at Wikimania they were a few discussions about the pertinence to push ZIM, especially if you think EPUB is already really good supported, it's also open, it's also compressed, etc. I made (again) a small benchmark to try to convince the last people who are skeptical.
Here is an example with Simple English Wikipedia without pictures:
- Raw content 125342 HTML pages 1.4 GB
- ZIM 93 MB Access time of article "Wikipedia" (HTML only) = 0.012s
- ZIP 331 MB Access time of article "Wikipedia" (HTML only) = 0.035s
Additional infos:
- ZIP random access time is proportional to the count of files, with 3
more HTML files, accessing the same content takes 0.113s (so almost x3)
- Please keep in mind that the test were ran on a Intel(R) Core(TM) i7
CPU X 980 @ 3.33GHz
- They are no images here... which are often bigger and which would be
re-compressed unnecessarily and so bigger in a ZIM
- Articles with 20-30 images are common.
- Benchmark was made with not so much files... The French Wikipedia has
many millions of files
I let you take the conclusion by yourself ;)
Regards Emmanuel
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
Hi Emmanuel,
thanks for that test. What I am really wondering about is EPUB. I did ask Erik at Wikimania if EPUB maybe is just all we need? I mean we need to be open here, no need for yet another format. Nevertheless, I don't know about it. So please, if someone could do this test with EPUB vs. ZIM I would be very much interested.
/Manuel
On 08/10/2011 02:36 PM, Manuel Schneider wrote:
Hi Emmanuel,
thanks for that test. What I am really wondering about is EPUB. I did ask Erik at Wikimania if EPUB maybe is just all we need? I mean we need to be open here, no need for yet another format. Nevertheless, I don't know about it. So please, if someone could do this test with EPUB vs. ZIM I would be very much interested.
/Manuel
We at Pediapress have developed a prototypical epub export for the collection extension.
I just generated epub and zim files for the following collection:
http://en.wikipedia.org/wiki/Book:American_History
The epub and zim files should contain the same images and otherwise the same content.
http://pediapress.com/files/epubvszim/book.epub http://pediapress.com/files/epubvszim/book.zim
Regarding filesize they are both pretty similar:
book.epub 21.589.212 bytes book.zim 23.500.867 bytes
I can't say anthing about file access times for epub but will my phone I have no performance problem when viewing the epub file.
Regards, Volker
On Wed, 10 Aug 2011 14:56:56 +0200, Volker Haas wrote:
We at Pediapress have developed a prototypical epub export for the collection extension.
I just generated epub and zim files for the following collection:
http://en.wikipedia.org/wiki/Book:American_History
The epub and zim files should contain the same images and otherwise the same content.
http://pediapress.com/files/epubvszim/book.epub http://pediapress.com/files/epubvszim/book.zim
Regarding filesize they are both pretty similar:
book.epub 21.589.212 bytes book.zim 23.500.867 bytes
I can't say anthing about file access times for epub but will my phone I have no performance problem when viewing the epub file.
To the opposite of my test, your benchmark is done on a small content.
For small amount of data, I think this does not really make sense to compare. From the user point of view this should be similar (file size & opening speed).
That's why, in such use case, EPUB has currently a big advantage over ZIM: a lot of devices support it. So EPUB should be for small collection of content and ZIM for big one.
But I still wonder that the ZIM file is bigger than the EPUB. ZIM should be in most of the cases (except maybe for really small "book" with only a few pictures) smaller and faster than EPUB.
I have done a new one (by unzipping the EPUB file) and got a file smaller (~5%) than your EPUB: 20.907.051 bytes
I remarked that your book.epub has almost 20% less entries (908: unzip -l book.epub | wc) than your book.zim (1071: zimdump -l book.zim | wc). This seems to me strange and this could maybe explains why your ZIM file is bigger than your EPUB.
Emmanuel
+1 to asaf's comment - ZIM does rock!
That said, there are advantages to offering epub support especially for those smaller selections, since, as Emmanuel notes, it is common for devices to support the reading of these files. So I think it is important for us to at least offer that solution in our tool-kit, but our main development focus should continue to be zim.
Jessie
On Wed, Aug 10, 2011 at 8:08 AM, emmanuel@engelhart.org wrote:
On Wed, 10 Aug 2011 14:56:56 +0200, Volker Haas wrote:
We at Pediapress have developed a prototypical epub export for the collection extension.
I just generated epub and zim files for the following collection:
http://en.wikipedia.org/wiki/Book:American_History
The epub and zim files should contain the same images and otherwise the same content.
http://pediapress.com/files/epubvszim/book.epub http://pediapress.com/files/epubvszim/book.zim
Regarding filesize they are both pretty similar:
book.epub 21.589.212 bytes book.zim 23.500.867 bytes
I can't say anthing about file access times for epub but will my phone I have no performance problem when viewing the epub file.
To the opposite of my test, your benchmark is done on a small content.
For small amount of data, I think this does not really make sense to compare. From the user point of view this should be similar (file size & opening speed).
That's why, in such use case, EPUB has currently a big advantage over ZIM: a lot of devices support it. So EPUB should be for small collection of content and ZIM for big one.
But I still wonder that the ZIM file is bigger than the EPUB. ZIM should be in most of the cases (except maybe for really small "book" with only a few pictures) smaller and faster than EPUB.
I have done a new one (by unzipping the EPUB file) and got a file smaller (~5%) than your EPUB: 20.907.051 bytes
I remarked that your book.epub has almost 20% less entries (908: unzip -l book.epub | wc) than your book.zim (1071: zimdump -l book.zim | wc). This seems to me strange and this could maybe explains why your ZIM file is bigger than your EPUB.
Emmanuel
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
I try to stay format agnostic when it comes to decisions like this and let our stats speak for themselves. Both formats have their pros/cons and each one has a dedicated user base.
Ignoring ePUB would limit our offline reach while neglecting openZIM would impair improving a burgeoning new format that's proven itself to be quite useful for our needs.
Both really should be in our export toolkit which is why Jessie and I have been actively discussing their support with WMF IT and PediaPress.
--tomasz
On Mon, Aug 15, 2011 at 4:17 PM, Jessie Wild jwild@wikimedia.org wrote:
+1 to asaf's comment - ZIM does rock!
That said, there are advantages to offering epub support especially for those smaller selections, since, as Emmanuel notes, it is common for devices to support the reading of these files. So I think it is important for us to at least offer that solution in our tool-kit, but our main development focus should continue to be zim.
Jessie
On Wed, Aug 10, 2011 at 8:08 AM, emmanuel@engelhart.org wrote:
On Wed, 10 Aug 2011 14:56:56 +0200, Volker Haas wrote:
We at Pediapress have developed a prototypical epub export for the collection extension.
I just generated epub and zim files for the following collection:
http://en.wikipedia.org/wiki/Book:American_History
The epub and zim files should contain the same images and otherwise the same content.
http://pediapress.com/files/epubvszim/book.epub http://pediapress.com/files/epubvszim/book.zim
Regarding filesize they are both pretty similar:
book.epub 21.589.212 bytes book.zim 23.500.867 bytes
I can't say anthing about file access times for epub but will my phone I have no performance problem when viewing the epub file.
To the opposite of my test, your benchmark is done on a small content.
For small amount of data, I think this does not really make sense to compare. From the user point of view this should be similar (file size & opening speed).
That's why, in such use case, EPUB has currently a big advantage over ZIM: a lot of devices support it. So EPUB should be for small collection of content and ZIM for big one.
But I still wonder that the ZIM file is bigger than the EPUB. ZIM should be in most of the cases (except maybe for really small "book" with only a few pictures) smaller and faster than EPUB.
I have done a new one (by unzipping the EPUB file) and got a file smaller (~5%) than your EPUB: 20.907.051 bytes
I remarked that your book.epub has almost 20% less entries (908: unzip -l book.epub | wc) than your book.zim (1071: zimdump -l book.zim | wc). This seems to me strange and this could maybe explains why your ZIM file is bigger than your EPUB.
Emmanuel
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
-- Jessie Wild Global Development, Manager Wikimedia Foundation
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
On Wed, 10 Aug 2011 14:36:48 +0200, Manuel Schneider wrote:
thanks for that test. What I am really wondering about is EPUB. I did ask Erik at Wikimania if EPUB maybe is just all we need? I mean we need to be open here, no need for yet another format. Nevertheless, I don't know about it. So please, if someone could do this test with EPUB vs. ZIM I would be very much interested.
Seems to be a small misunderstood here: an EPUB (file) is a ZIP (with additional constraints applied to the compressed content).
Emmanuel