Hi,
don't know if this issue came up already - in case it did and has been dismissed, I beg your pardon. In case it didn't...
I hereby propose, that pbzip2 (https://launchpad.net/pbzip2) is used to compress the xml dumps instead of bzip2. Why? Because its sibling (pbunzip2) has a bug bunzip2 hasn't. :-)
Strange? Read on.
A few hours ago, I filed a bug report for pbzip2 (see https://bugs.launchpad.net/pbzip2/+bug/922804) together with some test results done even some few hours before that.
The results indicate that:
bzip2 and pbzip2 are vice-versa compatible each one can create archives, the other one can read. But if it is for uncomressing, only pbzip2 compressed archives are good for pbunzip2.
I propose compressing the archives with pbzip2 for the following reasons:
1) If your archiving machines are SMP systems this could lead to a better usage of system ressources (i.e. faster compression).
2) Compression with pbzip2 is harmless for regular users of bunzip2, so everything should run for these people as usual.
3) pbzip2-compressed archives can be uncompressed with pbunzip2 with a speedup that scales nearly linearly with the number of CPUs in the host.
So to sum up: It's a no loose and two win situation if you migrate to pbzip2. And that just because pbunzip2 is slightly buggy. Isn't that interesting? :-)
cheers,