Re: [Toolserver-l] Compressing stats files better (was; Re: /mnt/user-store is full)

3 Jan 2011


      emijrp wrote:
...
Hi Frederic, thanks for your work. Have you tested 7z?
It makes no difference to me. River suggested (and installed) xz, so I 
used it, but 7z would have worked too.
A quick test using my biased data for one day (but it should be 
representative enough):
$ du -s *
1027260	7z         1004 M, 25.27% saved
1374804	gz          1.4 G,     0% saved
1020692	xz          997 M, 25.75% saved
The difference between xz and 7z is negligible (<1%). I haven't 
benchmarked anything formally, but 7z was much faster on my system. It 
looks like this is mainly because the software can use several cores 
simultaneously.
...
We can compress to xz while the new disks arrive. I read that it is 
about 24 TB, so, we can revert to gzip in the future.
Is there any particular reason to use gzip ? When I use these files, I 
mostly uncompress them on the fly from Perl, and there is a module to do 
this with zx too (haven't tested it, though). I am sure Python and other 
languages can do the same.
Even if we have plenty of space, it makes sense to use xz (or another 
format that offers good compression) and to benefit from the size 
reduction, for example if/when these files are backuped or moved around. 
Also, I'd like to be able to provide the files for download for those 
people who want local copies [several academic groups have already 
requested them], and the 25% size reduction is a big bonus here too.
But as I wrote earlier, these files are mostly archived on the 
toolserver, and I assume that most users don't dig often through the 
older ones, so that the best compression should not be a problem.
A better file format (e.g. one file per day, with separate data for 24 
hours, and another file with data aggregated per day) is probably what 
is most needed for "real uses" -- as far as I know, this is how Erik 
Zachte handles this data. A databae would be best, of course, but 
requires much more work...
As always, comments are very welcome.
Frédéric

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Compressing stats files better (was; Re: /mnt/user-store is full)