Thank you so much for looking into this.I am going to use mw package and see why my xml decoding is not producing this result.Best,BehzadOn Feb 24, 2015, at 11:36 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:Behzad,I just ran a quick script to check the count and it comes out as expected. Here's the call.$ bzcat <snip>/enwiki-20140707-pages-meta-history5.xml-p000183366p000184999.bz2 | python count_revs.py183366 Dude Ranch (album) 854183369 Scrambling 149183370 Home cinema 821183371 Hamilton Hume 184183372 Gertrude Gadwall 13183373 Critical chain project management 286183375 Luke The Goose 6183376 George Robertson, Baron Robertson of Port Ellen 350183378 Lord Robertson 9183379 List of rivers of Nova Scotia 120183380 Scottish whisky 1183381 Louis XV 1183382 Talk:List of artists who died of drug-related causes 16^CHere's what count_revs.py looks like:"""Counts the revisions per page in an XML dump and prints a nice format"""import sysfrom mw import xml_dumpdump = xml_dump.Iterator.from_file(sys.stdin)for page in dump:revisions = sum(1 for revision in page)print(page.id, page.title, revisions)_______________________________________________On Tue, Feb 24, 2015 at 3:41 PM, Behzad Tabibian <btabibian@gmail.com> wrote:Hi Jeremy,Thanks for the reply. I will look into that list.Best,BehzadOn Feb 24, 2015, at 7:49 PM, Jeremy Baron <jeremy@tuxmachine.com> wrote:_______________________________________________On Feb 24, 2015 1:44 PM, "Behzad Tabibian" <btabibian@gmail.com> wrote:
> I am new to working with Wikipedia dumps. I am trying to obtain full revision history of all the articles on Wikipedia. I downloaded enwiki-20140707-pages-meta-history1.xml-*.7z from https://dumps.wikimedia.org/enwiki/20140707/. However, by looking at the xml files revision history of individual articles do not match with revision history one may see from history page on Wikipedia website. It seems the dump contains significantly smaller number of revisions than what can be found on Wikipedia.This may be a decent place to ask (actually I don't read this list too much so just guessing) but probably more relevant at xmldatadumps-l@lists.wikimedia.org . FYI
-Jeremy
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l