[Toolserver-l] Downloading 7z dump for English Wikipedia, 30 GB

emijrp emijrp at gmail.com
Sat Mar 27 10:28:31 UTC 2010


Hi;

Yesterday (2010-03-26), the 7z dump for English Wikipedia was completed.[1]
I am downloading it at /mnt/user-store/dump directory, it will be finished
in a few hours (about 4), it is about 30 GB. So, if you need it, you know
where is, don't download it again! ; ).

A tip: in my python scripts, I decompress it on the fly, like this:

7za e -so ourdump.xml.7z | python ourscript.py

And, inside the script, I capture the data with: source=sys.stdin

Regards

[1]
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/toolserver-l/attachments/20100327/31401537/attachment.htm 


More information about the Toolserver-l mailing list