Hi,
I got stucked with an open source project which calls for enwiki-latest-pages-articles.xml.bz2 while I only have enwiki-latest-pages-articles-multistream.xml.bz2, the network status is too bad for me to download another large file, so I wondered what is the difference between this two file, I have read the descriptions from https://dumps.wikimedia.org/ , however, I am confused about the concept 'in multiple bz2 streams, 100 pages per stream', could anyone explain it for me? thanks!
This file contains multiple bz2 streams - this means it is actually a concatenation of multiple bz2 compressed files. The file enwiki-latest-pages-articles-multistream-index.txt.bz2 contains offsets of individual streams within the big multistream file. Just make sure you have both files for the same dump version/date.
Best, Marcin Osowski