Hi there, I got stucked with an open source project which calls for enwiki-latest-pages-articles.xml.bz2 while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too bad for me to download another large file, so I wondered what is the difference between this two file, I have read the descriptions from https://dumps.wikimedia.org/ , however, I am confused about the concept 'in multiple bz2 streams, 100 pages per stream', could anyone explain it for me? thanks!
126
126, 12/02/2015 15:38:
I got stucked with an open source project which calls for
enwiki-latest-pages-articles.xml.bz2 while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too bad for me to download another large file, so I wondered what is the difference between this two file, I have read the descriptions from https://dumps.wikimedia.org/ https://dumps.wikimedia.org/, , however, I am confused about the concept '/in multiple bz2 streams, 100 pages per stream', /could anyone explain it for me? thanks!
I can't check right now, but from the description you quoted it looks like a matter of compression in parallel threads, which should not affect the result when you uncompress the file. If your software fails, try and decompress, then recompress with standard bzip2.
Nemo
xmldatadumps-l@lists.wikimedia.org