It should have the wikitext of the most recent version of all pages.
Note, however, this is split into multiple parts, you only downloaded part
1 (page id 10 to page id 30303. Not all page id numbers are used which is
why it starts at 10) , other pages are in other parts. Additionally,
wikitext is not html, but a custom markup syntax we use which is later
rendered into html by the MediaWiki parser. This markup may include other
pages (via {{pagename}}).
--
brian
On Tuesday, June 5, 2018, Nikhil Prakash <nikhil07prakash(a)gmail.com> wrote:
Hi there,
This is Nikhil, an undergraduate student from India. And I'm trying to
understand the Wikipedia's data dumps provided by Wikimedia.
I'm working on 20180520 dumps. It contains many
sections, each having
different data. And I would like to know what each
section's data
represent. Although it's written in a brief, I don't get it clearly.
Like in section "All pages, current versions
only." Does each and every
article's current version is present in this
data? Because I just
downloaded "enwiki-20180520-pages-meta-current1.xml-p10p30303.bz2", the
first-page information is of "AccessibleComputing" but it does not have
complete article's information in it?
Hoping to get a quick reply.
Thanks.
Nikhil
<
https://ci5.googleusercontent.com/proxy/6B7a9Dxe45NiojYWxLFg1ygywFhtTx6WZ7Y…