Re: [Pywikipedia-l] XMLreader.py

6 Oct 2010

      ...
cElementTree. What are your versions?
Python 2.6.4 (r264:75706, Jun  4 2010, 18:20:31)
(on f13 x64)
Greetings
Am 06.10.2010 08:40, schrieb emijrp:
...
I have tested your code, with the bz2 and 7z dumps, and I get titles
with None value. The first one is the same error that apperas in my code.
Reading XML dump...
None 2004-10-10T04:24:14Z
I have the last version of pywikipediabot and Python 2.6.5 (r265:79063,
Apr 16 2010, 13:09:56). Probably, it can be a error of Python or
cElementTree. What are your versions?
2010/10/5 Russell Blau <russblau@hotmail.com mailto:russblau@hotmail.com>
"emijrp" <emijrp@gmail.com <mailto:emijrp@gmail.com>> wrote in message
news:AANLkTimu0+xJMBU1f48z8di9deBS_4_gmC_gOB6t82iJ@mail.gmail.com...

 > I think that there is an error in xmlreader.py. When parsing a full
 > revision XML (in this case[1]), using this code[2] (look at the
 > try-catch, it writes when fails) I get correctly username,
 > timestamp and revisionid, but sometimes, the page title and the page
 > id are None or empty string.

 > [1]
 >
http://download.wikimedia.org/kwwiki/20100926/kwwiki-20100926-pages-meta-history.xml.7z
 > [2] http://pastebin.ca/1951930
 > [3] http://pastebin.ca/1951937

I have been completely unable to replicate this supposed error.  I
downloaded the same kwwiki dump file that you referenced.  I loaded
it with
xmlreader.XmlDump, ran it through the parser, and counted the number of
XMLEntry objects it generated: 4711.  Then as a test I opened the
same dump
as a text file and counted the number of lines that contain the string
"<page>": 4711.  So the parser is correctly returning one object per
page
item found in the file.

Next I ran the parser again with a script that would print out a
message if
any XMLEntry object had a missing title (None or empty string); no
messages.

Then I searched for the specific page entry you showed in your
pastebin item
[3]. The result of this test is shown at [4]. In short, it found
exactly the
page title you said was missing.

I cannot explain why your results are different than mine, unless
perhaps
you have a corrupted copy of the dump file, or are not using the current
version of xmlreader.py.

Russ

[4] http://pastebin.ca/1955170

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
<mailto:Pywikipedia-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] XMLreader.py