(Fwding to Xmldatadumps-l@ list - https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l , CCing wikitech-ambassadors@ just for resolution here)

On Mon, Jun 4, 2018 at 10:55 PM, Nikhil Prakash <nikhil07prakash@gmail.com> wrote:
Hi there,

This is Nikhil, an undergraduate student from India. And I'm trying to understand the Wikipedia's data dumps provided by Wikimedia.

I'm working on 20180520 dumps. It contains many sections, each having different data. And I would like to know what each section's data represent. Although it's written in a brief, I don't get it clearly. 

Like in section "All pages, current versions only." Does each and every article's current version is present in this data? Because I just downloaded "enwiki-20180520-pages-meta-current1.xml-p10p30303.bz2", the first-page information is of "AccessibleComputing" but it does not have complete article's information in it?

Hoping to get a quick reply.


Wikitech-ambassadors mailing list

Nick Wilson (Quiddity)
Community Liaison, Wikimedia Foundation