On 14 February 2012 00:46, Bináris <wikiposta@gmail.com> wrote:
I went back to this conversation with Russell, and tried to use it in an other way. I have console encoding problems with this command with Cyrillic letters:
replace.py -catr:Венгрия . @ -lang:ru -excepttext:"[[hu:" -save:magyarok.txt -always
One way is to urlencode the Russian category. Other way is to insert it into a script. (DOS batch files won't work, I already tried.)
So what I did:
import replace
replace.main(u'-catr:Венгрия', '.', '@', '-lang:ru', '-excepttext:"[[hu:"', '-save:magyarok.txt')
This results in an error message:
  File "C:\Pywikipedia\replace.py", line 582, in main
    for arg in pywikibot.handleArgs(*args):
  File "C:\Pywikipedia\wikipedia.py", line 7795, in handleArgs
    arg = _decodeArg(arg)
  File "C:\Pywikipedia\wikipedia.py", line 7767, in _decodeArg
    return unicode(arg, config.console_encoding)
TypeError: decoding Unicode is not supported
If I omit u from before -catr, no error is thrown, but the name is erroneously decoded.
Now comes the tick! I went to line 7795 of current wikipedia.py (r9894) as shown above, and commented it out. Now my script runs perfectly! I love it!

What happens is the following. In the context of line 7767, arg=u'-catr:Венгрия' (type=Unicode). The line then tries to decode a Unicode string, which makes no sense: you can only decode a str representation.

The sensible solution would be to add a check, for instance something like

return arg is isinstance(arg, unicode) else unicode(arg, config.console_encoding)

(which mght not work for python 2.4, though, so having a normal if/else might be preferrable).

Merlijn