On 03/27/09 02:21, Ilmari Karonen wrote:
Brion Vibber wrote:
On 3/26/09 12:30 PM, Ilmari Karonen wrote:
The Wikipedia article (what else?) on the format says the blocks are padded to byte boundaries, and some quick testing seems to support that.
That is a filthy lie. :)
There is indeed no byte padding between blocks; it made my implementation of a parallel bzip2 compressor much harder and I never got round to finishing the decompressor.
You're right, I misread it. Only the whole stream is padded. :(
(In my defense, that seems like such a moronic design choice that I couldn't believe it could be true. If you're going to waste 48 bits per block on pi-in-BCD anyway, it seems silly to skimp on the 4 bits of per block that'd be needed on average to pad to a byte boundary.)
http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#limits
"Much of this complexity could have been avoided if the compressed size of each block of data was recorded in the data stream."
lacos