Bugs item #3020887, was opened at 2010-06-24 09:31 Message generated for change (Comment added) made by dnessett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3020887...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dan Nessett (dnessett) Assigned to: xqt (xqt) Summary: redirectRegex throws type error
Initial Comment: Running MW 1.13.2, the following command throws a type error:
$ python add_text.py -cat:Pages_with_too_many_expensive_parser_function_calls -text:" " -summary:"Test edit:Category jog for [[:Category:Pages with too many expensive parser function calls|Pages with too many expensive parser function calls]]"
The result is:
Getting [[Category:Pages with too many expensive parser function calls]]... Loading 2009 White House Forum on Health Reform/Related Articles... Do you want to accept these changes? ([y]es, [N]o, [a]ll) a Updating page [[2009 White House Forum on Health Reform/Related Articles]] via API Loading 2010 United Kingdom general election/Related Articles... Traceback (most recent call last): File "add_text.py", line 417, in <module> main() File "add_text.py", line 413, in main create=talkPage) File "add_text.py", line 201, in add_text text = page.get() File "/usr/local/src/python/pywikipedia/local_sites/wikipedia.py", line 619, in get self._contents = self._getEditPage(get_redirect = get_redirect, throttle = throttle, sysop = sysop) File "/usr/local/src/python/pywikipedia/local_sites/wikipedia.py", line 727, in _getEditPage m = self.site().redirectRegex().match(pagetext) File "/usr/local/src/python/pywikipedia/local_sites/wikipedia.py", line 6644, in redirectRegex pattern = r'(?:' + '|'.join(keywords) + ')' TypeError
version.py output is:
$ python version.py Pywikipedia [http] trunk/pywikipedia (r8311, 2010/06/22, 13:20:10) Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] config-settings: use_api = True use_api_login = True
This error occurs due to the following bug in the code. At line 6642 is the following code fragment:
try: keywords = self.getmagicwords('redirect') pattern = r'(?:' + '|'.join(keywords) + ')' except KeyError: # no localized keyword for redirects pattern = r'#%s' % default
getmagicwords is a one line method that simply calls siteinfo (line 5480) with the key 'magicwords'. At line 5518, siteinfo calls getData to obtain site data. When looking for magicwords, the method executes "for entry in data[key]" at line 5527. For certain versions of MW, magicwords are not returned as part of the site data and therefore data[key] returns a null result. Eventually, this leads to the KeyError exception at line 5538.
The bug arises because siteinfo catches the KeyError exception and returns a result of "None". When the call is unwound back to line 6643 the provision for a KeyError at line 6645 is vacuous. The KeyError has already been caught by siteinfo.
Consequently, the statement at line 6644 executes. This causes a TypeError since the keyword arguement to .join() is null.
----------------------------------------------------------------------
Comment By: Dan Nessett (dnessett) Date: 2010-06-25 15:02
Message: The bug fix in r8329 doesn't correct the problem. This is perhaps because I mis-analyzed the problem. In fact the try ... except block in siteinfo accomplishes nothing, since the KeyError occurs outside its scope. So, what really happens is the exception occurs and propagates. However, the value returned on an exception is None. So, it propagates through getmagicwords to redirectRegex. For some reason I don't understand, it is not caught by the except clause there before the pattern statement executes (causing the type error).
The solution (which I have tested) is to put a try ... except block in getmagicwords and return None when a KeyError occurs. This consumes the KeyError exception and allows the change in r8333 to redirectRegex to work properly. In addition, it makes no sense to have the try ... except block in siteinfo, since it isn't possible for a KeyError to occur as the result of either of the two return statements.
I will attach a patch against r8333 that fixes the problem.
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-06-25 00:20
Message: fixed in r8329
----------------------------------------------------------------------
Comment By: xqt (xqt) Date: 2010-06-24 14:29
Message: Thanks a lot for analyzing it and these details. I'll fix it tomorrow.
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=3020887...
pywikipedia-bugs@lists.wikimedia.org