Brion Vibber wrote:
More importantly, not every decompressor will decompress concatenated streams. Dictating which decoder end-users should use is not cool. :)
The reference bzip2 tool has supported it for ages. I added support for concatenated bzip2 files to php bz2 on September. It is only supported on the newer php versions. (Oddly, importDump.php doesn't seem to be supporting bzipped dumps) Don't know about java/mwdumper support.
** 2. If dump.bz2 was single-block, many-stream (as opposed to the current many-block, single-stream), then people on the importing end could speed up *decompression* with pbzip2.
Lack of compatibility with other tools makes this format undesirable; further note that a smarter decompressor could act as bzip2recover does to estimate block boundaries and decompress them speculatively. In the rare case of an incorrect match, you've only lost one to two blocks' worth of time.
Support of those other tools for streams add quite a complexity for decompressors wanting to decompress only a block (due to the byte-unaligned nature of blocks).
I never got round to completing the decompressor implementation for dbzip2, though..
The code at http://svn.wikimedia.org/viewvc/mediawiki/trunk/dbzip2/ ?