For external uses like XML dumps integrating the compression strategy into LZMA would however be very attractive. This would also benefit other users of LZMA compression like HBase.
For dumps or other uses, 7za -mx=3 / xz -3 is your best bet.
That has a 4 MB buffer, compression ratios within 15-25% of current 7zip (or histzip), and goes at 30MB/s on my box, which is still 8x faster than the status quo (going by a 1GB benchmark).
Trying to get quick-and-dirty long-range matching into LZMA isn't feasible for me personally and there may be inherent technical difficulties. Still, I left a note on the 7-Zip boards as folks suggested; feel free to add anything there: https://sourceforge.net/p/sevenzip/discussion/45797/thread/73ed3ad7/
Thanks for the reply, Randall
On Tue, Jan 21, 2014 at 2:19 PM, Randall Farmer randall@wawd.com wrote:
For external uses like XML dumps integrating the compression strategy into LZMA would however be very attractive. This would also benefit other users of LZMA compression like HBase.
For dumps or other uses, 7za -mx=3 / xz -3 is your best bet.
That has a 4 MB buffer, compression ratios within 15-25% of current 7zip (or histzip), and goes at 30MB/s on my box, which is still 8x faster than the status quo (going by a 1GB benchmark).
Re: trying to get long-range matching into LZMA, first, I couldn't confidently hack on liblzma. Second, Igor might not want to do anything as niche-specific as this (but who knows!). Third, even with a faster matching strategy, the LZMA *format* seems to require some intricate stuff (range coding) that be a blocker to getting the ideal speeds (honestly not sure).
In any case, I left a note on the 7-Zip boards as folks have suggested: https://sourceforge.net/p/sevenzip/discussion/45797/thread/73ed3ad7/
Thanks for the reply, Randall
xmldatadumps-l@lists.wikimedia.org