Is this a replace.py error or an xmlreader.py error or a dump error?
Reading XML dump... Traceback (most recent call last): File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1255, in __iter__ for page in self.wrapped_gen: File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1157, in Duplicate FilterPageGenerator for page in generator: File "replace.py", line 259, in __iter__ for entry in self.parser: File "C:\Program Files\Pywikipedia\xmlreader.py", line 313, in new_parse for event, elem in context: File "<string>", line 68, in __iter__ SyntaxError: no element found: line 27487351, column 9 no element found: line 27487351, column 9 0 titles were saved.
I got a similar message for another search, but this time it found the first 20 instances. Maybe the previous one had less than 20. I had a case like this some months ago, and then it worked again (maybe after changing dump?).
Traceback (most recent call last): File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1255, in __iter__ for page in self.wrapped_gen: File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1157, in Duplicate FilterPageGenerator for page in generator: File "replace.py", line 259, in __iter__ for entry in self.parser: File "C:\Program Files\Pywikipedia\xmlreader.py", line 313, in new_parse for event, elem in context: File "<string>", line 68, in __iter__ SyntaxError: no element found: line 27487351, column 9 no element found: line 27487351, column 9 20 titles were saved.
The dump must have been corrupted; I downloaded the previous dump and the bot has successfully run. But I am not sure that xmlreader.py has to react to a corrupt dump like this.
Sorry. This one is suspicious: http://dumps.wikimedia.org/huwiki/latest/huwiki-latest-pages-articles.xml.bz... Its date is May 31. It has to be identical with this permalink: http://download.wikimedia.org/huwiki/20110531/huwiki-20110531-pages-articles... The previous one of May 19 works well.
2011/6/8 Marcin Cieslak saper@saper.info
The dump must have been corrupted; I downloaded the previous dump and the bot has successfully run. But I am not sure that xmlreader.py has to
react
to a corrupt dump like this.
Without knowing which dump you are using it is impossible to say...
//Saper
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Again, the link is the same, but the dump changed to 14 June.
2011/6/8 Bináris wikiposta@gmail.com
Sorry. This one is suspicious:
http://dumps.wikimedia.org/huwiki/latest/huwiki-latest-pages-articles.xml.bz...
My screen is: Traceback (most recent call last): File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1255, in __iter__ for page in self.wrapped_gen: File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1113, in Namespace FilterPageGenerator for page in generator: File "C:\Program Files\Pywikipedia\pagegenerators.py", line 1157, in Duplicate FilterPageGenerator for page in generator: File "replace.py", line 259, in __iter__ for entry in self.parser: File "C:\Program Files\Pywikipedia\xmlreader.py", line 313, in new_parse for event, elem in context: File "<string>", line 68, in __iter__ SyntaxError: no element found: line 27524494, column 9 no element found: line 27524494, column 9 59 titles were saved.
I used replace.py with 3 different fixes, the result and the numbers are always the same. It stops somewhere around May 2011, but maybe it stops at the end. Is it possible, that every occurence is found, and I get the error message at the end of dump? I can only check the creation date of the last found article, and this is April.May of this year, so I am not sure anything is left out.
Bináris wikiposta@gmail.com wrote:
http://dumps.wikimedia.org/huwiki/latest/huwiki-latest-pages-articles.xml=
.bz2
FilterPageGenerator for page in generator: File "replace.py", line 259, in __iter__ for entry in self.parser: File "C:\Program Files\Pywikipedia\xmlreader.py", line 313, in new_parse for event, elem in context: File "<string>", line 68, in __iter__ SyntaxError: no element found: line 27524494, column 9 no element found: line 27524494, column 9
Yes, those dumps are broken indeed.
This is because you are using experimental extension (LiquidThreads) on your wiki and this extension is unable to properly dump some fancy characters. This page is causing trouble:
https://secure.wikimedia.org/wikipedia/hu/wiki/T%C3%A9ma:Szerkeszt%C5%91vita...)
or, to be more specific, this:
https://secure.wikimedia.org/wikipedia/hu/wiki/Speci%C3%A1lis:Lapok_export%C...)
I have filed this bug for you:
https://bugzilla.wikimedia.org/show_bug.cgi?id=29564
but it seems this not a first time when LiquidThreads breaks dumps (https://bugzilla.wikimedia.org/show_bug.cgi?id=22688).
//Marcin
Thank you very much! On one hand, this is disappointing because I see no hope for a quick fix, on the other hand I would have never found this problem, and I appreciate your help. Unfortunaltely, there is a popular habit in huwiki to use LT without its beeing ready.
Thank you very much! On one hand, this is disappointing because I see no hope for a quick fix, on the other hand I would have never found this problem, and I appreciate your help. Unfortunaltely, there is a popular habit in huwiki to use LT without its beeing ready.
Actually Brion committed one of the first fixes (for the XML output), so it has a chance to be deployed before the next dump occurs.
You may add yourself to the "CC List" of this bug:
https://bugzilla.wikimedia.org/show_bug.cgi?id=29564
and you will be automatically notified on progress on this case.
//Marcin