Bugs item #1871586, was opened at 2008-01-14 22:26 Message generated for change (Comment added) made by malafaya You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki crashes with: regular expression code size limit e
Initial Comment: bot crashes with: regular expression code size limit exceeded error on many pages
Error report:
Updating links on page [[pl:10,000 Maniacs]]. No changes needed Getting 37 pages from wikipedia:ru... Dump pl (wikipedia) saved Traceback (most recent call last): File "C:\dw\pywikipedia\interwiki.py", line 1606, in <module> bot.run() File "C:\dw\pywikipedia\interwiki.py", line 1381, in run self.queryStep() File "C:\dw\pywikipedia\interwiki.py", line 1355, in queryStep self.oneQuery() File "C:\dw\pywikipedia\interwiki.py", line 1351, in oneQuery subject.workDone(self) File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone elif page.isEmpty() and not page.isCategory(): File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty txt = removeLanguageLinks(txt) File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks % languageR, re.IGNORECASE) File "C:\Python25\lib\re.py", line 180, in compile return _compile(pattern, flags) File "C:\Python25\lib\re.py", line 231, in _compile p = sre_compile.compile(pattern, flags) File "C:\Python25\lib\sre_compile.py", line 530, in compile groupindex, indexgroup OverflowError: regular expression code size limit exceeded
----------------------------------------------------------------------
Comment By: André Malafaya Baptista (malafaya)
Date: 2008-01-15 21:44
Message: Logged In: YES user_id=1037345 Originator: NO
As of r4893, I believe changing wikipedia.py line 2810 to: 'source': re.compile(r'(?is)<source>.*?</source>'), would solve the problem. There was an unclosed '<' after 'source'. I'm not absolutely sure about this as testing this problem doesn't seem easy. It also occurred to me but I can't precise under which conditions.
----------------------------------------------------------------------
Comment By: masti (mstmst) Date: 2008-01-15 19:02
Message: Logged In: YES user_id=1974561 Originator: NO
It looks that the error only occurs when interwiki is run with -autonomous switch. For example runnig this command in pl.wiki: interwiki.py -start:100BASE-FX -autonomous casues bot to run thru multiple pages giving in the end following error:
[[100 dni Napoleona]]: [[ja:????]] gives new interwiki [[he:m'h hymym]] [[101 (liczba)]]: [[ja:101]] gives new interwiki [[ms:101 (nombor)]] ======Post-processing [[pl:10164 Akusekijima]]====== Updating links on page [[pl:10164 Akusekijima]]. No changes needed ======Post-processing [[pl:10163 Onomichi]]====== Updating links on page [[pl:10163 Onomichi]]. No changes needed ======Post-processing [[pl:10157 Asagiri]]====== Updating links on page [[pl:10157 Asagiri]]. No changes needed ======Post-processing [[pl:10143 Kamogawa]]====== Updating links on page [[pl:10143 Kamogawa]]. No changes needed ======Post-processing [[pl:10142 Sakka]]====== Updating links on page [[pl:10142 Sakka]]. No changes needed ======Post-processing [[pl:10117 Tanikawa]]====== Updating links on page [[pl:10117 Tanikawa]]. No changes needed Getting 23 pages from wikipedia:id... NOTE: [[id:100 (buku)]] is redirect to [[id:The 100]] Getting 21 pages from wikipedia:uk... Getting 18 pages from wikipedia:lt... Getting 16 pages from wikipedia:fr... Sleeping for 3.2 seconds, 2008-01-15 19:50:31 Getting 15 pages from wikipedia:es... ======Post-processing [[pl:100BASE-FX]]====== ERROR: Found link to [[pl:Fast Ethernet]] [[en:Fast Ethernet]] [[es:Fast Ethernet]] [[fr:100BASE-T4]] [[id:Fast Ethernet]] [[it:Fast Ethernet]] [[ja:100megabitto ihsanetto]] [[lt:Fast Ethernet]] [[pt:Fast Ethernet]] [[uk:Fast Ethernet]] ERROR: Found more than one link for wikipedia:es ERROR: Found more than one link for wikipedia:fr ======Aborted processing [[pl:100BASE-FX]]====== Getting 42 pages from wikipedia:de... Getting 31 pages from wikipedia:sv... Getting 28 pages from wikipedia:nl... Dump pl (wikipedia) saved Traceback (most recent call last): File "C:\dw\pywikipedia\interwiki.py", line 1609, in <module> bot.run() File "C:\dw\pywikipedia\interwiki.py", line 1384, in run self.queryStep() File "C:\dw\pywikipedia\interwiki.py", line 1358, in queryStep self.oneQuery() File "C:\dw\pywikipedia\interwiki.py", line 1354, in oneQuery subject.workDone(self) File "C:\dw\pywikipedia\interwiki.py", line 724, in workDone elif page.isEmpty() and not page.isCategory(): File "C:\dw\pywikipedia\wikipedia.py", line 860, in isEmpty txt = removeLanguageLinks(txt) File "C:\dw\pywikipedia\wikipedia.py", line 3054, in removeLanguageLinks % languageR, re.IGNORECASE) File "C:\Python25\lib\re.py", line 180, in compile return _compile(pattern, flags) File "C:\Python25\lib\re.py", line 231, in _compile p = sre_compile.compile(pattern, flags) File "C:\Python25\lib\sre_compile.py", line 530, in compile groupindex, indexgroup OverflowError: regular expression code size limit exceeded
above test done with r4893.
----------------------------------------------------------------------
Comment By: Rotem Liss (rotemliss) Date: 2008-01-15 16:07
Message: Logged In: YES user_id=1327030 Originator: NO
I can't reproduce the bug. What is the exact command?
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1871586...
pywikipedia-l@lists.wikimedia.org