[Toolserver-l] Downloading 7z dump for English Wikipedia, 30 GB
emijrp
emijrp at gmail.com
Sat Mar 27 10:28:31 UTC 2010
Hi;
Yesterday (2010-03-26), the 7z dump for English Wikipedia was completed.[1]
I am downloading it at /mnt/user-store/dump directory, it will be finished
in a few hours (about 4), it is about 30 GB. So, if you need it, you know
where is, don't download it again! ; ).
A tip: in my python scripts, I decompress it on the fly, like this:
7za e -so ourdump.xml.7z | python ourscript.py
And, inside the script, I capture the data with: source=sys.stdin
Regards
[1]
http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/toolserver-l/attachments/20100327/31401537/attachment.htm
More information about the Toolserver-l
mailing list