On 14 February 2012 00:46, Bináris <wikiposta(a)gmail.com> wrote:
I went back to this conversation with Russell, and
tried to use it in an
other way. I have console encoding problems with this command with Cyrillic
letters:
replace.py -catr:Венгрия . @ -lang:ru -excepttext:"[[hu:"
-save:magyarok.txt -always
One way is to urlencode the Russian category. Other way is to insert it
into a script. (DOS batch files won't work, I already tried.)
So what I did:
import replace
replace.main(u'-catr:Венгрия', '.', '@', '-lang:ru',
'-excepttext:"[[hu:"', '-save:magyarok.txt')
This results in an error message:
File "C:\Pywikipedia\replace.py", line 582, in main
for arg in pywikibot.handleArgs(*args):
File "C:\Pywikipedia\wikipedia.py", line 7795, in handleArgs
arg = _decodeArg(arg)
File "C:\Pywikipedia\wikipedia.py", line 7767, in _decodeArg
return unicode(arg, config.console_encoding)
TypeError: decoding Unicode is not supported
If I omit u from before -catr, no error is thrown, but the name is
erroneously decoded.
Now comes the tick! I went to line 7795 of current wikipedia.py (r9894) as
shown above, and commented it out. Now my script runs perfectly! I love it!
What happens is the following. In the context of line 7767, arg=
u'-catr:Венгрия' (type=Unicode). The line then tries to *decode* a Unicode
string, which makes no sense: you can only decode a str representation.
The sensible solution would be to add a check, for instance something like
return arg is isinstance(arg, unicode) else unicode(arg,
config.console_encoding)
(which mght not work for python 2.4, though, so having a normal if/else
might be preferrable).
Merlijn