On 3/26/09 10:58 AM, ERSEK Laszlo wrote:
** 1. If the export process uses dbzip2 to compress
the dump, and dbzip2's
MO is to compress input blocks independently, then to bit-shift the
resulting compressed blocks (= single-block bzip2 streams) back into a
single multi-block bzip2 stream, so that the resulting file is
bit-identical to what bzip2 would produce, then the export process wastes
(CPU) time. Bunzip2 can decompress concatenated bzip2 streams. In exchange
for a small size penalty, the dumper could just concatenate the
single-block bzip2 streams, saving a lot of cycles.
It's been years since I poked it seriously so I don't recall any exact
figures, but I doubt it's very many cycles, and mass bit-shifting is
likely trivial to optimize should anyone feel it necessary.
More importantly, not every decompressor will decompress concatenated
streams. Dictating which decoder end-users should use is not cool. :)
** 2. If dump.bz2 was single-block, many-stream (as
opposed to the current
many-block, single-stream), then people on the importing end could speed
up *decompression* with pbzip2.
Lack of compatibility with other tools makes this format undesirable;
further note that a smarter decompressor could act as bzip2recover does
to estimate block boundaries and decompress them speculatively. In the
rare case of an incorrect match, you've only lost one to two blocks'
worth of time.
I never got round to completing the decompressor implementation for
dbzip2, though.
** 3. Even if dump2.bz2 stays single-stream, *or* it
becomes multi-stream
*but* is available only from a pipe or socket, decompression can still be
sped up by way of lbzip2 (which I wrote, and am promoting here). Since
it's written in strict adherence to the Single UNIX Specification, Version
2, it's available on Cygwin too, and should work on the Mac.
Awesome!
-- brion