I see you got more pointers there. :) Did you manage to explore them?

The blocker is that I didn't hear much interest from dump folks in a non-7z archive format even if it boosted compression speed a lot. Of the packers Bulat replied with (zpaq, exdupe, pcompress, his own srep), exdupe and srep explicitly promise fast deduping and give numbers, so they'd be the most obvious to look at if you've got a use case. Here are the crib notes on how I'd look at them:

Long-range compressors are often set up to find kilobytes-long repeats over 100MB+ distances. rzip and lrzip are like that, for example. That's because that's what you need for, e.g., deduping copies of the same large file across a backup. But when you support a much longer window than you need, you pay with some combination of RAM use, inability to stream input/output (because you need random reads from the history if it doesn't fit in RAM), or compression ratio (because you miss shorter matches). That's why the original rzip wasn't an ideal drop-in for 7zip for Wiki full-history dumps, though it did very well on benchmarks that used small pieces of a dump.

So the things I'd look at re: other long-range compressors are whether they can stream input/output (and so fit in existing dump/load flows) and whether they do well fed an actual many-GBs chunk of dump (in output size, ratio, and RAM/CPU use). Of course, you might have flexibility on some of those axes, e.g., if you have no problem dropping input/output streaming.

Hope this helps,
Randall


On Sat, Mar 8, 2014 at 6:53 AM, Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Randall Farmer, 21/01/2014 23:26:

Trying to get quick-and-dirty long-range matching into LZMA isn't
feasible for me personally and there may be inherent technical
difficulties. Still, I left a note on the 7-Zip boards as folks
suggested; feel free to add anything there:
https://sourceforge.net/p/sevenzip/discussion/45797/thread/73ed3ad7/

I see you got more pointers there. :) Did you manage to explore them?

Nemo