On 01/04/2013 04:49 AM, Tommi Mäkitalo wrote:
Hi,
there should be only one compression algorithm. Otherwise a reader must
be able to handle every supported algorithm. What is the point of having
a standard format where some readers could read only part of them?
A Javascript decoder is so slow that optimizing the supported
compression algorithm containers might be warranted.
It might only make a performance difference. There might be a new
container format that simply loads faster on average.
The zimwriter makes clusters of 1MB of html files and
compresses them
with lzma2. Actually no xz overhead is used here. The 1MB cluster size
is choosen because lzma2 uses it. Larger clusters do not increase the
compression ratioa at all.
The clusters are compressed using the XZ container which has streams and
then blocks of LZMA2 and the LZMA2 container then uses chunks of either
LZMA compressed data or uncompressed data. There may be some
unnecessary baggage here and these containers may not be optimal for the
ZIM format. If the decoding time can on average be halved by changing
the containers then it might warrant consideration.
The writer has a fixed list of mime types, which are
not compressed. The
mime types are "image/jpeg", "image/png", "image/tiff",
"image/gif"
and"application/zip". The writer do not try to compress them further but
they are stored as is in a separate cluster.
For this reason the LZMA2 container may be redundant. LZMA2 added
support for uncompressed chunks, but since much of the uncompressible
blobs are placed in separate clusters this extra LZMA2 support may just
be baggage. I note that having all the images in non-compressed
clusters will help make a Javascript port more practical as this means
that there will be less clusters to decode in a typical page.
Regards
Douglas Crosher