On Sat, Jan 16, 2016 at 12:17:20PM +0100, Richard Jelinek wrote:
The world has moved on since my proposal in 2012 and
maybe a look at
https://github.com/Cyan4973/lz4
would be in order.
The compression ratio is bad
# lz4 -9 ces-20160111.xml ces-20160111.xml.lz4
Compressed 2343742714 bytes into 818397719 bytes ==> 34.92%
compared to 557039914 bytes of bzip2
-rw-r--r-- 1 root root 557039914 Jan 13 08:48 ces-20160111.xml.bz2
-rw-r--r-- 1 root root 818397719 Jan 16 12:31 ces-20160111.xml.lz4
however, the decompression is a speedup of 25 give or take (on our
machine):
# time bunzip2 -k ces-20160111.xml.bz2
real 1m47.853s
user 1m40.992s
sys 0m3.692s
# time lz4 -d ces-20160111.xml.lz4 ces-20160111.xml2
Successfully decoded 2343742714 bytes
real 0m4.416s
user 0m2.079s
sys 0m2.340s
So if you need decompression speed and can handle the 60% larger
archive size, you may want to use that. Although I doubt that is the
default requirement.
regards,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH -
www.petamem.com Geschäftsführer: Richard Jelinek
Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth
2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201