<div><br></div>I am entirely for 7z. In fact, once released, I'll be able to test the XML integrity right away - I process the data on the fly, without unpacking it first.<br><br><div class="gmail_quote"><br></div><div class="gmail_quote">
On Tue, Mar 16, 2010 at 4:45 PM, Tomasz Finc <span dir="ltr"><<a href="mailto:tfinc@wikimedia.org">tfinc@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">Kevin Webb wrote:<br>
> I just managed to finish decompression. That took about 54 hours on an<br>
> EC2 2.5x unit CPU. The final data size is 5469GB.<br>
><br>
> As the process just finished I haven't been able to check the<br>
> integrity of the XML, however, the bzip stream itself appears to be<br>
> good.<br>
><br>
> As was mentioned previously, it would be great if you could compress<br>
> future archives using pbzib to allow for parallel decompression. As I<br>
> understand it, the pbzip files are reverse compatible with all<br>
> existing bzip2 utilities.<br>
<br>
</div>Looks like the trade off is slightly larger files due to pbzip2's<br>
algorithm for individual chunking. We'd have to change the<br>
<br>
buildFilters function in <a href="http://tinyurl.com/yjun6n5" target="_blank">http://tinyurl.com/yjun6n5</a> and install the new<br>
binary. Ubuntu already has it in 8.04 LTS making it easy.<br>
<br>
Any takers for the change?<br>
<br>
I'd also like to gauge everyones opinion on moving away from the large<br>
file sizes of bz2 and going exclusively 7z. We'd save a huge amount of<br>
space doing it at a slightly larger cost during compression.<br>
Decompression of 7z these days is wicked fast.<br>
<br>
let know<br>
<div><div></div><div class="h5"><br>
--tomasz<br>
<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________<br>
Xmldatadumps-admin-l mailing list<br>
<a href="mailto:Xmldatadumps-admin-l@lists.wikimedia.org">Xmldatadumps-admin-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-admin-l</a><br>
</div></div></blockquote></div><br>