Pakaran suggested on IRC the use of 7zip's LZMA compression for data dumps, claiming really big improvements in compression over gzip. I did some test runs with the September 17 dump of es.wikipedia.org and can confirm it does make a big difference:
10,995,508,118 pages_full.xml 1.00x uncompressed XML 2,320,992,228 pages_full.xml.gz 4.74x gzipped output from mwdumper 775,765,248 pages_full.xml.bz2 14.17x "bzip2" 155,983,464 pages_full.xml.7z 70.49x "7za a -si"
(gzip -9 makes a neglible difference versus the default compression level; bzip2 -9 seems to make no difference.)
The 7za program is a fair bit slower than gzip, but at 10-15 times better compression I suspect many people would find the download savings worth a little extra trouble.
While it's not any official or de-facto standard that we know of, the code is open source (LGPL, CPL) and a basic command-line archiver is available for most Unix-like platforms as well as Windows so it should be free to use (in the absence of surprise patents): http://www.7-zip.org/sdk.html
I'm probably going to try to work LZMA compression into the dump process to supplement the gzipped files; and/or we could switch from gzip back to bzip2, which provides a still respectable improvement in compression and is a bit more standard.
(We'd switched from bzip2 to gzip at some point in the SQL dump saga; I think this was when we had started using gzip internally on 'old' text entries and the extra time spent on bzip2 was wasted trying to recompress the raw gzip data in the dumps.)
-- brion vibber (brion @ pobox.com)