On 03/26/09 20:30, Ilmari Karonen wrote:
The Wikipedia article (what else?) on the format says the blocks are padded to byte boundaries, and some quick testing seems to support that.
http://en.wikipedia.org/wiki/Bzip2#File_format
The compressed blocks are bit-aligned and no padding occurs.
The bzip2 stream is padded.
decompressible ::= stream | decompressible decompressible
stream ::= stream_header block* stream_footer
stream_header ::= STREAM_START_MAGIC VERSION BLOCKSIZE
block ::= BLOCK_START_MAGIC BLOCK_CRC BLOCK_DATA
stream_footer ::= STREAM_END_MAGIC COMBINED_STREAM_CRC STREAM_PADDING
Bunzip2 decompresses "decompressible", while bzip2 creates (in a single run) one "stream". A "decompressible" can be formed by concatenating two other "decompressible"s. In a "decompressible", the meat of any given bzip2 block starts at BLOCK_START_MAGIC, and terminates right before the next BLOCK_START_MAGIC or STREAM_END_MAGIC, whichever comes first.
lacos