[Pywikipedia-l] [ pywikipediabot-Bugs-1871586 ] interwiki crashes with: regular expression code size limit e
SourceForge.net
noreply at sourceforge.net
Wed Jan 16 08:31:49 UTC 2008
Bugs item #1871586, was opened at 2008-01-14 23:26
Message generated for change (Comment added) made by cosoleto
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586&group_id=93107
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki crashes with: regular expression code size limit e
Initial Comment:
bot crashes with: regular expression code size limit exceeded error on many pages
Error report:
Updating links on page [[pl:10,000 Maniacs]].
No changes needed
Getting 37 pages from wikipedia:ru...
Dump pl (wikipedia) saved
Traceback (most recent call last):
File "C:\dw\pywikipedia\interwiki.py", line 1606, in <module>
bot.run()
File "C:\dw\pywikipedia\interwiki.py", line 1381, in run
self.queryStep()
File "C:\dw\pywikipedia\interwiki.py", line 1355, in queryStep
self.oneQuery()
File "C:\dw\pywikipedia\interwiki.py", line 1351, in oneQuery
subject.workDone(self)
File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
elif page.isEmpty() and not page.isCategory():
File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
txt = removeLanguageLinks(txt)
File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks
% languageR, re.IGNORECASE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 231, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Python25\lib\sre_compile.py", line 530, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
----------------------------------------------------------------------
>Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-16 09:31
Message:
Logged In: YES
user_id=181280
Originator: NO
Fixed in r4896.
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-16 09:10
Message:
Logged In: YES
user_id=181280
Originator: NO
This bug is related to removeLanguageLinks function in wikipedia module:
languageR variable increase his length for each call until produce a
overflow error in re module.
----------------------------------------------------------------------
Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 23:19
Message:
Logged In: YES
user_id=1037345
Originator: NO
yep, so can I just now :(
----------------------------------------------------------------------
Comment By: masti (mstmst)
Date: 2008-01-15 23:09
Message:
Logged In: YES
user_id=1974561
Originator: NO
updated to r4894
unfortunately I can still reproduce same error.
----------------------------------------------------------------------
Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 22:56
Message:
Logged In: YES
user_id=1037345
Originator: NO
In fact, it was line 2836.
I just commited those changes to SVN (r4894).
This bug should be considered fixed if it does not re-occur.
----------------------------------------------------------------------
Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 22:44
Message:
Logged In: YES
user_id=1037345
Originator: NO
As of r4893, I believe changing wikipedia.py line 2810 to:
'source': re.compile(r'(?is)<source>.*?</source>'),
would solve the problem.
There was an unclosed '<' after 'source'.
I'm not absolutely sure about this as testing this problem doesn't seem
easy. It also occurred to me but I can't precise under which conditions.
----------------------------------------------------------------------
Comment By: masti (mstmst)
Date: 2008-01-15 20:02
Message:
Logged In: YES
user_id=1974561
Originator: NO
It looks that the error only occurs when interwiki is run with -autonomous
switch. For example runnig this command in pl.wiki:
interwiki.py -start:100BASE-FX -autonomous
casues bot to run thru multiple pages giving in the end following error:
[[100 dni Napoleona]]: [[ja:????]] gives new interwiki [[he:m'h hymym]]
[[101 (liczba)]]: [[ja:101]] gives new interwiki [[ms:101 (nombor)]]
======Post-processing [[pl:10164 Akusekijima]]======
Updating links on page [[pl:10164 Akusekijima]].
No changes needed
======Post-processing [[pl:10163 Onomichi]]======
Updating links on page [[pl:10163 Onomichi]].
No changes needed
======Post-processing [[pl:10157 Asagiri]]======
Updating links on page [[pl:10157 Asagiri]].
No changes needed
======Post-processing [[pl:10143 Kamogawa]]======
Updating links on page [[pl:10143 Kamogawa]].
No changes needed
======Post-processing [[pl:10142 Sakka]]======
Updating links on page [[pl:10142 Sakka]].
No changes needed
======Post-processing [[pl:10117 Tanikawa]]======
Updating links on page [[pl:10117 Tanikawa]].
No changes needed
Getting 23 pages from wikipedia:id...
NOTE: [[id:100 (buku)]] is redirect to [[id:The 100]]
Getting 21 pages from wikipedia:uk...
Getting 18 pages from wikipedia:lt...
Getting 16 pages from wikipedia:fr...
Sleeping for 3.2 seconds, 2008-01-15 19:50:31
Getting 15 pages from wikipedia:es...
======Post-processing [[pl:100BASE-FX]]======
ERROR: Found link to [[pl:Fast Ethernet]]
[[en:Fast Ethernet]]
[[es:Fast Ethernet]]
[[fr:100BASE-T4]]
[[id:Fast Ethernet]]
[[it:Fast Ethernet]]
[[ja:100megabitto ihsanetto]]
[[lt:Fast Ethernet]]
[[pt:Fast Ethernet]]
[[uk:Fast Ethernet]]
ERROR: Found more than one link for wikipedia:es
ERROR: Found more than one link for wikipedia:fr
======Aborted processing [[pl:100BASE-FX]]======
Getting 42 pages from wikipedia:de...
Getting 31 pages from wikipedia:sv...
Getting 28 pages from wikipedia:nl...
Dump pl (wikipedia) saved
Traceback (most recent call last):
File "C:\dw\pywikipedia\interwiki.py", line 1609, in <module>
bot.run()
File "C:\dw\pywikipedia\interwiki.py", line 1384, in run
self.queryStep()
File "C:\dw\pywikipedia\interwiki.py", line 1358, in queryStep
self.oneQuery()
File "C:\dw\pywikipedia\interwiki.py", line 1354, in oneQuery
subject.workDone(self)
File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
elif page.isEmpty() and not page.isCategory():
File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
txt = removeLanguageLinks(txt)
File "C:\dw\pywikipedia\wikipedia.py", line 3054, in
removeLanguageLinks
% languageR, re.IGNORECASE)
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 231, in _compile
p = sre_compile.compile(pattern, flags)
File "C:\Python25\lib\sre_compile.py", line 530, in compile
groupindex, indexgroup
OverflowError: regular expression code size limit exceeded
above test done with r4893.
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss)
Date: 2008-01-15 17:07
Message:
Logged In: YES
user_id=1327030
Originator: NO
I can't reproduce the bug. What is the exact command?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586&group_id=93107
More information about the Pywikipedia-l
mailing list