It should have the wikitext of the most recent version of all pages.
Note, however, this is split into multiple parts, you only downloaded part 1 (page id 10 to page id 30303. Not all page id numbers are used which is why it starts at 10) , other pages are in other parts. Additionally, wikitext is not html, but a custom markup syntax we use which is later rendered into html by the MediaWiki parser. This markup may include other pages (via {{pagename}}).
On Tuesday, June 5, 2018, Nikhil Prakash <> wrote:
> Hi there,
> This is Nikhil, an undergraduate student from India. And I'm trying to understand the Wikipedia's data dumps provided by Wikimedia.
> I'm working on 20180520 dumps. It contains many sections, each having different data. And I would like to know what each section's data represent. Although it's written in a brief, I don't get it clearly.
> Like in section "All pages, current versions only." Does each and every article's current version is present in this data? Because I just downloaded "enwiki-20180520-pages-meta-current1.xml-p10p30303.bz2", the first-page information is of "AccessibleComputing" but it does not have complete article's information in it?
> Hoping to get a quick reply.
> Thanks.
> Nikhil
> <>