Bugs item #2035835, was opened at 2008-08-02 12:02
Message generated for change (Comment added) made by darkoneko
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035835&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: SaxParseBug caused error invalid literal for int()
Initial Comment:
I got an error message an trace dump from interwiki.py which afterwords continues gracefully. Here are the messages:
python /home/purodha/pywikipedia/interwiki.py -v -initialredirect -new:3
Checked for running processes. 1 processes currently running, including the current process.
Pywikipediabot (r5776 (wikipedia.py), Aug 01 2008, 15:39:04)
Python 2.5.2 (r252:60911, May 28 2008, 19:19:25)
[GCC 4.2.4 (Debian 4.2.4-1)]
Retrieving mediawiki messages from Special:Allmessages
WARNING: No character set found.
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 3 pages from wikipedia:ksh...
-- some lines skipped --
Getting 1 pages from wikipedia:am...
ERROR: SaxParseBug caused error invalid literal for int() with base 10: 'NS_CATEGORY'. Dump SaxParseBug_wikipedia_am__Sat_Aug__2_09-54-57_2008.dump created.
Traceback (most recent call last):
File "/home/purodha/pywikipedia/pagegenerators.py", line 768, in __iter__
for loaded_page in self.preload(somePages):
File "/home/purodha/pywikipedia/pagegenerators.py", line 785, in preload
wikipedia.getall(site, pagesThisSite)
File "/home/purodha/pywikipedia/wikipedia.py", line 2950, in getall
_GetAll(site, pages, throttle, force).run()
File "/home/purodha/pywikipedia/wikipedia.py", line 2798, in run
xml.sax.parseString(data, handler)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/__init__.py", line 47, in parseString
parser.parse(inpsrc)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 216, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 312, in start_element
self._cont_handler.startElement(name, AttributesImpl(attrs))
File "/home/purodha/pywikipedia/xmlreader.py", line 150, in startElement
self.namespaceid = int(attrs['key'])
ValueError: invalid literal for int() with base 10: 'NS_CATEGORY'
invalid literal for int() with base 10: 'NS_CATEGORY'
Getting page [[am:????]]
etc.
----------------------------------------------------------------------
Comment By: DarkoNeko (darkoneko)
Date: 2008-08-02 13:14
Message:
Logged In: YES
user_id=1809111
Originator: NO
Same for me
----version----
Pywikipedia [http] trunk/pywikipedia (r5781, Aug 01 2008, 21:44:26)
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)]
----trace----
Updating links on page [[scn:1054]].
No changes needed
Updating links on page [[be:1054]].
No changes needed
Getting 60 pages from wikipedia:oc...
Getting 60 pages from wikipedia:mk...
Checked for running processes. 1 processes currently running, including
the curr
ent process.
Getting 60 pages from wikipedia:sw...
Getting 60 pages from wikipedia:pi...
Getting 60 pages from wikipedia:sa...
Getting 60 pages from wikipedia:am...
ERROR: SaxParseBug caused error invalid literal for int() with base 10:
'NS_CATE
GORY'. Dump SaxParseBug_wikipedia_am__Sat_Aug_02_13-07-31_2008.dump
created.
Traceback (most recent call last):
File "C:\Program Files\TortoiseSVN\pywikipedia\pagegenerators.py", line
762, i
n __iter__
for loaded_page in self.preload(somePages):
File "C:\Program Files\TortoiseSVN\pywikipedia\pagegenerators.py", line
785, i
n preload
wikipedia.getall(site, pagesThisSite)
File "C:\Program Files\TortoiseSVN\pywikipedia\wikipedia.py", line 2950,
in ge
tall
_GetAll(site, pages, throttle, force).run()
File "C:\Program Files\TortoiseSVN\pywikipedia\wikipedia.py", line 2798,
in ru
n
xml.sax.parseString(data, handler)
File "c:\Program Files\Python25\lib\xml\sax\__init__.py", line 49, in
parseStr
ing
parser.parse(inpsrc)
File "c:\Program Files\Python25\lib\xml\sax\expatreader.py", line 107,
in pars
e
xmlreader.IncrementalParser.parse(self, source)
File "c:\Program Files\Python25\lib\xml\sax\xmlreader.py", line 123, in
parse
self.feed(buffer)
File "c:\Program Files\Python25\lib\xml\sax\expatreader.py", line 207,
in feed
self._parser.Parse(data, isFinal)
File "c:\Program Files\Python25\lib\xml\sax\expatreader.py", line 301,
in star
t_element
self._cont_handler.startElement(name, AttributesImpl(attrs))
File "C:\Program Files\TortoiseSVN\pywikipedia\xmlreader.py", line 150,
in sta
rtElement
self.namespaceid = int(attrs['key'])
ValueError: invalid literal for int() with base 10: 'NS_CATEGORY'
invalid literal for int() with base 10: 'NS_CATEGORY'
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035835&group_…
Bugs item #2035835, was opened at 2008-08-02 10:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035835&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Purodha B Blissenbach (purodha)
Assigned to: Nobody/Anonymous (nobody)
Summary: SaxParseBug caused error invalid literal for int()
Initial Comment:
I got an error message an trace dump from interwiki.py which afterwords continues gracefully. Here are the messages:
python /home/purodha/pywikipedia/interwiki.py -v -initialredirect -new:3
Checked for running processes. 1 processes currently running, including the current process.
Pywikipediabot (r5776 (wikipedia.py), Aug 01 2008, 15:39:04)
Python 2.5.2 (r252:60911, May 28 2008, 19:19:25)
[GCC 4.2.4 (Debian 4.2.4-1)]
Retrieving mediawiki messages from Special:Allmessages
WARNING: No character set found.
NOTE: Number of pages queued is 0, trying to add 60 more.
Getting 3 pages from wikipedia:ksh...
-- some lines skipped --
Getting 1 pages from wikipedia:am...
ERROR: SaxParseBug caused error invalid literal for int() with base 10: 'NS_CATEGORY'. Dump SaxParseBug_wikipedia_am__Sat_Aug__2_09-54-57_2008.dump created.
Traceback (most recent call last):
File "/home/purodha/pywikipedia/pagegenerators.py", line 768, in __iter__
for loaded_page in self.preload(somePages):
File "/home/purodha/pywikipedia/pagegenerators.py", line 785, in preload
wikipedia.getall(site, pagesThisSite)
File "/home/purodha/pywikipedia/wikipedia.py", line 2950, in getall
_GetAll(site, pages, throttle, force).run()
File "/home/purodha/pywikipedia/wikipedia.py", line 2798, in run
xml.sax.parseString(data, handler)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/__init__.py", line 47, in parseString
parser.parse(inpsrc)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 109, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 216, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.5/site-packages/_xmlplus/sax/expatreader.py", line 312, in start_element
self._cont_handler.startElement(name, AttributesImpl(attrs))
File "/home/purodha/pywikipedia/xmlreader.py", line 150, in startElement
self.namespaceid = int(attrs['key'])
ValueError: invalid literal for int() with base 10: 'NS_CATEGORY'
invalid literal for int() with base 10: 'NS_CATEGORY'
Getting page [[am:????]]
etc.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035835&group_…
Bugs item #2035818, was opened at 2008-08-02 11:11
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035818&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: DarkoNeko (darkoneko)
Assigned to: Nobody/Anonymous (nobody)
Summary: incorrect page retrival for zhwiki
Initial Comment:
Version used : 5781
interwiki.py tend to gives errors like this when trying to access the zh wikipedia (ever since I added it to my config yesterday) - it only happens sometimes, I haven't been able to determine a common factor for the problem yet.
No changes needed
Updating links on page [[fr:Ludwigshafen]].
No changes needed
Updating links on page [[de:Ludwigshafen am Rhein]].
No changes needed
Getting 12 pages from wikipedia:io...
======Post-processing [[ja:ruhto furitto]]======
Updating links on page [[zh:?????]].
Changes to be made: ??: [[eu:Ruud Gullit]]
+ [[eu:Ruud Gullit]]
NOTE: Updating live wiki...
Changing page [[zh:?????]]
Server error encountered; will retry in 1 minute.
Changing page [[zh:?????]]
Server error encountered; will retry in 2 minutes.
Changing page [[zh:?????]]
Server error encountered; will retry in 4 minutes.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=2035818&group_…
Revision: 5779
Author: valhallasw
Date: 2008-08-01 20:36:19 +0000 (Fri, 01 Aug 2008)
Log Message:
-----------
Updated replace.py with -requiretitle: support. Kinda hackish implementation, but a better one requires some major refactoring. Requirement is now stored in the exceptions dict and handled as such (i.e. if the requirement is not met, that page is on the exceptions list)
Modified Paths:
--------------
trunk/pywikipedia/replace.py
Modified: trunk/pywikipedia/replace.py
===================================================================
--- trunk/pywikipedia/replace.py 2008-08-01 19:29:29 UTC (rev 5778)
+++ trunk/pywikipedia/replace.py 2008-08-01 20:36:19 UTC (rev 5779)
@@ -33,6 +33,10 @@
argument is given, XYZ will be regarded as a regular
expression.
+-requiretitle:XYZ Only do pages with titles that contain XYZ. If the -regex
+ argument is given, XYZ will be regarded as a regular
+ expression.
+
-excepttext:XYZ Skip pages which contain the text XYZ. If the -regex
argument is given, XYZ will be regarded as a regular
expression.
@@ -226,6 +230,11 @@
for exc in self.exceptions['title']:
if exc.search(title):
return True
+ if self.exceptions.has_key('require-title'):
+ for req in self.exceptions['require-title']:
+ if not req.search(title): # if not all requirements are met:
+ return True
+
return False
def isTextExcepted(self, text):
@@ -298,6 +307,10 @@
for exc in self.exceptions['title']:
if exc.search(title):
return True
+ if self.exceptions.has_key('require-title'):
+ for req in self.exceptions['require-title']:
+ if not req.search(title):
+ return True
return False
def isTextExcepted(self, original_text):
@@ -454,7 +467,9 @@
'text-contains': [],
'inside': [],
'inside-tags': [],
- }
+ 'require-title': [], # using a seperate requirements dict needs some
+ } # major refactoring of code.
+
# Should the elements of 'replacements' and 'exceptions' be interpreted
# as regular expressions?
regex = False
@@ -514,6 +529,8 @@
PageTitles.append(arg[6:])
elif arg.startswith('-excepttitle:'):
exceptions['title'].append(arg[13:])
+ elif arg.startswith('-requiretitle:'):
+ exceptions['require-title'].append(arg[14:])
elif arg.startswith('-excepttext:'):
exceptions['text-contains'].append(arg[12:])
elif arg.startswith('-exceptinside:'):
@@ -627,7 +644,7 @@
oldR = re.compile(old, re.UNICODE)
replacements[i] = oldR, new
- for exceptionCategory in ['title', 'text-contains', 'inside']:
+ for exceptionCategory in ['title', 'require-title', 'text-contains', 'inside']:
if exceptions.has_key(exceptionCategory):
patterns = exceptions[exceptionCategory]
if not regex:
Patches item #2033435, was opened at 2008-07-31 05:47
Message generated for change (Comment added) made by nicdumz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2033435&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Pending
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Woo-Jin Kim (kwj2772)
Assigned to: Nobody/Anonymous (nobody)
Summary: checkimages.py - support ko.wikipedia
Initial Comment:
I've modified checkimages.py and operating in kowikipedia.
This script will support Korean.
script source:
http://ko.wikipedia.org/wiki/%EC%82%AC%EC%9A%A9%EC%9E%90:%EA%B9%80%EC%9A%B0…
Thank you!
----------------------------------------------------------------------
>Comment By: NicDumZ Nicolas Dumazet (nicdumz)
Date: 2008-08-01 18:44
Message:
Logged In: YES
user_id=1963242
Originator: NO
Looks like you were using an old version ? I only added the localisation
messages in r5777, let me know if something else was needed.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2033435&group_…