On Thu, Dec 16, 2010 at 12:47 AM, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
At the moment I'm interested in .bz2 and .7z
because those are the
formats WikiMedia currently publishes data in.
I'm fairly certain the specific 7z format which Wikimedia uses doesn't
allow for random access, because the dictionary is never reset.
Have we made the case for this format to the WikiMedia
people?
No, there's no off-the-shelf tool to create these files - the standard
.xz file created by xz utils puts everything in one stream, which is
basically equivalent to the .7z files already being made. I'm sure
"patches are welcome", but I don't have the time to create the patch.
How is .xz for compression times?
At the default settings, it's quite slow. I believe it's pretty much
the same as 7zip with its default settings. The main reason I was
using xz instead of 7zip is that xz handles pipes better -
specifically, 7zip doesn't allow you to pipe from stdin to stdout.
(See
https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/383667 and
the response - "You should use lzma." - well, lzma utils has been
replaced by xz utils.)
For decompression, .xz is generally faster than .bz2, slower than .gz
Would we have to worry about patent issues for LZMA?
No, it uses LZMA2.