http://pythonhosted.org/mediawiki-utilities/

pip install mediawiki-utilities  :) 

python 3.x only :/  

On Tue, Feb 24, 2015 at 4:40 PM, Behzad Tabibian <btabibian@gmail.com> wrote:
Thank you so much for looking into this. 
I am going to use mw package and see why my xml decoding is not producing this result.

Best,
Behzad

On Feb 24, 2015, at 11:36 PM, Aaron Halfaker <ahalfaker@wikimedia.org> wrote:

Behzad, 

I just ran a quick script to check the count and it comes out as expected.  Here's the call. 

$ bzcat <snip>/enwiki-20140707-pages-meta-history5.xml-p000183366p000184999.bz2 | python count_revs.py 
183366 Dude Ranch (album) 854
183369 Scrambling 149
183370 Home cinema 821
183371 Hamilton Hume 184
183372 Gertrude Gadwall 13
183373 Critical chain project management 286
183375 Luke The Goose 6
183376 George Robertson, Baron Robertson of Port Ellen 350
183378 Lord Robertson 9
183379 List of rivers of Nova Scotia 120
183380 Scottish whisky 1
183381 Louis XV 1
183382 Talk:List of artists who died of drug-related causes 16
^C

Here's what count_revs.py looks like:

"""
Counts the revisions per page in an XML dump and prints a nice format
"""
import sys
from mw import xml_dump

dump = xml_dump.Iterator.from_file(sys.stdin)

for page in dump:
revisions = sum(1 for revision in page)
print(page.id, page.title, revisions)

On Tue, Feb 24, 2015 at 3:41 PM, Behzad Tabibian <btabibian@gmail.com> wrote:
Hi Jeremy,

Thanks for the reply. I will look into that list.

Best,
Behzad

On Feb 24, 2015, at 7:49 PM, Jeremy Baron <jeremy@tuxmachine.com> wrote:

On Feb 24, 2015 1:44 PM, "Behzad Tabibian" <btabibian@gmail.com> wrote:
> I am new to working with Wikipedia dumps. I am trying to obtain full revision history of all the articles on Wikipedia. I downloaded enwiki-20140707-pages-meta-history1.xml-*.7z from https://dumps.wikimedia.org/enwiki/20140707/. However, by looking at the xml files revision history of individual articles do not match with revision history one may see from history page on Wikipedia website. It seems the dump contains significantly smaller number of revisions than what can be found on Wikipedia.

This may be a decent place to ask (actually I don't read this list too much so just guessing) but probably more relevant at xmldatadumps-l@lists.wikimedia.org . FYI

-Jeremy

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l