On 3/26/09 10:58 AM, ERSEK Laszlo wrote:
** 1. If the export process uses dbzip2 to compress the dump, and dbzip2's MO is to compress input blocks independently, then to bit-shift the resulting compressed blocks (= single-block bzip2 streams) back into a single multi-block bzip2 stream, so that the resulting file is bit-identical to what bzip2 would produce, then the export process wastes (CPU) time. Bunzip2 can decompress concatenated bzip2 streams. In exchange for a small size penalty, the dumper could just concatenate the single-block bzip2 streams, saving a lot of cycles.
It's been years since I poked it seriously so I don't recall any exact figures, but I doubt it's very many cycles, and mass bit-shifting is likely trivial to optimize should anyone feel it necessary.
More importantly, not every decompressor will decompress concatenated streams. Dictating which decoder end-users should use is not cool. :)
** 2. If dump.bz2 was single-block, many-stream (as opposed to the current many-block, single-stream), then people on the importing end could speed up *decompression* with pbzip2.
Lack of compatibility with other tools makes this format undesirable; further note that a smarter decompressor could act as bzip2recover does to estimate block boundaries and decompress them speculatively. In the rare case of an incorrect match, you've only lost one to two blocks' worth of time.
I never got round to completing the decompressor implementation for dbzip2, though.
** 3. Even if dump2.bz2 stays single-stream, *or* it becomes multi-stream *but* is available only from a pipe or socket, decompression can still be sped up by way of lbzip2 (which I wrote, and am promoting here). Since it's written in strict adherence to the Single UNIX Specification, Version 2, it's available on Cygwin too, and should work on the Mac.
Awesome!
-- brion