Hi,
I got stucked with an open source project which calls for enwiki-latest-pages-articles.xml.bz2
while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too
bad for me to download another large file, so I wondered what is the difference between this
two file, I have read the descriptions from https://dumps.wikimedia.org/ , however, I am
confused about the concept 'in multiple bz2 streams, 100 pages per stream', could
anyone explain it for me? thanks!
This file contains multiple bz2 streams - this means it is actually a
concatenation of multiple bz2 compressed files. The file
enwiki-latest-pages-articles-multistream-index.txt.bz2 contains
offsets of individual streams within the big multistream file. Just
make sure you have both files for the same dump version/date.
Best,
Marcin Osowski