[Pywikipedia-l] [ pywikipediabot-Bugs-1871586 ] interwiki crashes with: regular expression code size limit e

SourceForge.net noreply at sourceforge.net
Wed Jan 16 08:10:22 UTC 2008


Bugs item #1871586, was opened at 2008-01-14 23:26
Message generated for change (Comment added) made by cosoleto
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586&group_id=93107

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: Remind
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: interwiki crashes with: regular expression code size limit e

Initial Comment:
bot crashes with: regular expression code size limit exceeded error on many pages

Error report:


Updating links on page [[pl:10,000 Maniacs]].
No changes needed
Getting 37 pages from wikipedia:ru...
Dump pl (wikipedia) saved
Traceback (most recent call last):
  File "C:\dw\pywikipedia\interwiki.py", line 1606, in <module>
    bot.run()
  File "C:\dw\pywikipedia\interwiki.py", line 1381, in run
    self.queryStep()
  File "C:\dw\pywikipedia\interwiki.py", line 1355, in queryStep
    self.oneQuery()
  File "C:\dw\pywikipedia\interwiki.py", line 1351, in oneQuery
    subject.workDone(self)
  File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
    elif page.isEmpty() and not page.isCategory():
  File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
    txt = removeLanguageLinks(txt)
  File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks
    % languageR, re.IGNORECASE)
  File "C:\Python25\lib\re.py", line 180, in compile
    return _compile(pattern, flags)
  File "C:\Python25\lib\re.py", line 231, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python25\lib\sre_compile.py", line 530, in compile
    groupindex, indexgroup
OverflowError: regular expression code size limit exceeded

----------------------------------------------------------------------

>Comment By: Francesco Cosoleto (cosoleto)
Date: 2008-01-16 09:10

Message:
Logged In: YES 
user_id=181280
Originator: NO

This bug is related to removeLanguageLinks function in wikipedia module:
languageR variable increase his length for each call until produce a
overflow error in re module.

----------------------------------------------------------------------

Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 23:19

Message:
Logged In: YES 
user_id=1037345
Originator: NO

yep, so can I just now :(

----------------------------------------------------------------------

Comment By: masti (mstmst)
Date: 2008-01-15 23:09

Message:
Logged In: YES 
user_id=1974561
Originator: NO

updated to r4894
unfortunately I can still reproduce same error.

----------------------------------------------------------------------

Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 22:56

Message:
Logged In: YES 
user_id=1037345
Originator: NO

In fact, it was line 2836.
I just commited those changes to SVN (r4894).
This bug should be considered fixed if it does not re-occur.

----------------------------------------------------------------------

Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 22:44

Message:
Logged In: YES 
user_id=1037345
Originator: NO

As of r4893, I believe changing wikipedia.py line 2810 to:
        'source':      re.compile(r'(?is)<source>.*?</source>'),
would solve the problem.
There was an unclosed '<' after 'source'.
I'm not absolutely sure about this as testing this problem doesn't seem
easy. It also occurred to me but I can't precise under which conditions.

----------------------------------------------------------------------

Comment By: masti (mstmst)
Date: 2008-01-15 20:02

Message:
Logged In: YES 
user_id=1974561
Originator: NO

It looks that the error only occurs when interwiki is run with -autonomous
switch. For example runnig this command in pl.wiki: 
interwiki.py -start:100BASE-FX -autonomous
casues bot to run thru multiple pages giving in the end following error:

[[100 dni Napoleona]]: [[ja:????]] gives new interwiki [[he:m'h hymym]]
[[101 (liczba)]]: [[ja:101]] gives new interwiki [[ms:101 (nombor)]]
======Post-processing [[pl:10164 Akusekijima]]======
Updating links on page [[pl:10164 Akusekijima]].
No changes needed
======Post-processing [[pl:10163 Onomichi]]======
Updating links on page [[pl:10163 Onomichi]].
No changes needed
======Post-processing [[pl:10157 Asagiri]]======
Updating links on page [[pl:10157 Asagiri]].
No changes needed
======Post-processing [[pl:10143 Kamogawa]]======
Updating links on page [[pl:10143 Kamogawa]].
No changes needed
======Post-processing [[pl:10142 Sakka]]======
Updating links on page [[pl:10142 Sakka]].
No changes needed
======Post-processing [[pl:10117 Tanikawa]]======
Updating links on page [[pl:10117 Tanikawa]].
No changes needed
Getting 23 pages from wikipedia:id...
NOTE: [[id:100 (buku)]] is redirect to [[id:The 100]]
Getting 21 pages from wikipedia:uk...
Getting 18 pages from wikipedia:lt...
Getting 16 pages from wikipedia:fr...
Sleeping for 3.2 seconds, 2008-01-15 19:50:31
Getting 15 pages from wikipedia:es...
======Post-processing [[pl:100BASE-FX]]======
ERROR: Found link to [[pl:Fast Ethernet]]
    [[en:Fast Ethernet]]
    [[es:Fast Ethernet]]
    [[fr:100BASE-T4]]
    [[id:Fast Ethernet]]
    [[it:Fast Ethernet]]
    [[ja:100megabitto ihsanetto]]
    [[lt:Fast Ethernet]]
    [[pt:Fast Ethernet]]
    [[uk:Fast Ethernet]]
ERROR: Found more than one link for wikipedia:es
ERROR: Found more than one link for wikipedia:fr
======Aborted processing [[pl:100BASE-FX]]======
Getting 42 pages from wikipedia:de...
Getting 31 pages from wikipedia:sv...
Getting 28 pages from wikipedia:nl...
Dump pl (wikipedia) saved
Traceback (most recent call last):
  File "C:\dw\pywikipedia\interwiki.py", line 1609, in <module>
    bot.run()
  File "C:\dw\pywikipedia\interwiki.py", line 1384, in run
    self.queryStep()
  File "C:\dw\pywikipedia\interwiki.py", line 1358, in queryStep
    self.oneQuery()
  File "C:\dw\pywikipedia\interwiki.py", line 1354, in oneQuery
    subject.workDone(self)
  File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone
    elif page.isEmpty() and not page.isCategory():
  File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty
    txt = removeLanguageLinks(txt)
  File "C:\dw\pywikipedia\wikipedia.py", line 3054, in
removeLanguageLinks
    % languageR, re.IGNORECASE)
  File "C:\Python25\lib\re.py", line 180, in compile
    return _compile(pattern, flags)
  File "C:\Python25\lib\re.py", line 231, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Python25\lib\sre_compile.py", line 530, in compile
    groupindex, indexgroup
OverflowError: regular expression code size limit exceeded

above test done with r4893.

----------------------------------------------------------------------

Comment By: Rotem Liss (rotemliss)
Date: 2008-01-15 17:07

Message:
Logged In: YES 
user_id=1327030
Originator: NO

I can't reproduce the bug. What is the exact command?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586&group_id=93107



More information about the Pywikipedia-l mailing list