On Wed, Dec 15, 2010 at 12:01 PM, Andrew Dunbar hippytrail@gmail.com wrote:
By the way I'm keen to find something similar for .7z
I've written something similar for .xz, which uses LZMA2 same as .7z. It creates a virtual read-only filesystem using FUSE (the FUSE part is in perl, which uses pipes to dd and xzcat). Only real problem is that it doesn't use a stock .xz file, it uses a specially created one which concatenates lots of smaller .xz files (currently I concatenate between 5 and 20 or so 900K bz2 blocks into one .xz stream - between 5 and 20 because there's a preference to split on </page><page> boundaries).
Apparently the folks at openzim have done something similar, using LZMA2.
If anyone is interesting in working with me to make a package capable of being released to the public, I'd be willing to share my code. But it sounds like I'm just reinventing a wheel already invented by opensim.
It would be incredibly useful if these indices could be created as part of the dump creation process. Should I file a feature request?
With concatenated .xz files, creating the index is *much* faster, because the .xz format puts the stream size at the end of each stream. Plus with .xz all streams are broken on 4-byte boundaries, whereas with .bz2 blocks can end at any *bit* (which means you have to do painful bit shifting to create the index).
The file is also *much* smaller, on the order of 5-10% of bzip2 for a full history dump.