Jamie Morken wrote:
Hi,
Thanks for the info, while I was at it I did some more checking of the history dump file
sizes and compression ratios (as reported by 7-Zip 9.20):
enwiki-20110115-pages-meta-history1.xml.7z 434.99x compression
enwiki-20110115-pages-meta-history2.xml.7z 289.46x compression
enwiki-20110115-pages-meta-history3.xml.7z 248.72x compression
enwiki-20110115-pages-meta-history4.xml.7z 216.29x compression
enwiki-20110115-pages-meta-history5.xml.7z 198.67x compression
enwiki-20110115-pages-meta-history6.xml.7z 176.94x compression
enwiki-20110115-pages-meta-history7.xml.7z 161.42x compression
enwiki-20110115-pages-meta-history8.xml.7z 208.59x compression
enwiki-20110115-pages-meta-history9.xml.7z 126.86x compression
enwiki-20110115-pages-meta-history10.xml.7z 112.10x compression
enwiki-20110115-pages-meta-history11.xml.7z 117.27x compression
enwiki-20110115-pages-meta-history12.xml.7z 118.88x compression
enwiki-20110115-pages-meta-history13.xml.7z 133.07x compression
enwiki-20110115-pages-meta-history14.xml.7z 107.10x compression
enwiki-20110115-pages-meta-history15.xml.7z 83.24x compression
pages-meta-history1 has the oldest articles and also the most revisions, therefore it has
the
highest compression ratio (as most revisions have only minor changes for established
articles).
The pages-meta-history15 file contains the most recently created articles which have the
least revisions,
but tend to have greater relative changes compared to the overall article size, and thus
has the lowest 7z compression.
enwiki-20110115-pages-meta-history8.xml doesn't follow the pattern of decreasing
compression ratios.
Maybe it contains many bot created articles?
That's all I can report without actually looking
inside these files! :)
cheers,
Jamie