Was: Parameters of handleArgs
I went back to this conversation with Russell, and tried to use it in an other way. I have console encoding problems with this command with Cyrillic letters: replace.py -catr:Венгрия . @ -lang:ru -excepttext:"[[hu:" -save:magyarok.txt -always One way is to urlencode the Russian category. Other way is to insert it into a script. (DOS batch files won't work, I already tried.) So what I did: import replace replace.main(u'-catr:Венгрия', '.', '@', '-lang:ru', '-excepttext:"[[hu:"', '-save:magyarok.txt') This results in an error message: File "C:\Pywikipedia\replace.py", line 582, in main for arg in pywikibot.handleArgs(*args): File "C:\Pywikipedia\wikipedia.py", line 7795, in handleArgs arg = _decodeArg(arg) File "C:\Pywikipedia\wikipedia.py", line 7767, in _decodeArg return unicode(arg, config.console_encoding) TypeError: decoding Unicode is not supported If I omit u from before -catr, no error is thrown, but the name is erroneously decoded. Now comes the tick! I went to line 7795 of current wikipedia.py (r9894) as shown above, and commented it out. Now my script runs perfectly! I love it!
I don't want to spoil handleArgs() and I know this is an unusual use of it. But is it possible in some way to pass a parameter to it that tells _decodeArg to shut up? Or is there another correct way of passing Unicode parameters from within a script?
2011/5/18 Russell Blau russblau@imapmail.org
Bináris said:
I see in a couple of bots this construction:
def main(*args): for arg in pywikibot.handleArgs(*args): etc.
Now, if I write instead of this def main(): for arg in pywikibot.handleArgs(): etc. the result seems to be just the same. I tried with valid global and with unique parameters as well. So, what is the difference? I know the theory that * means a variable width argument list, but if I omit it, the behaviour does > not change.
The behavior is the same if you run the script from the command line.
However, using (*args) also allows you the option of running the script from inside the Python interactive interpreter; for example, if you were running "replace.py Foo Bar -start:!", then you could "import replace" in the interpreter and run <code>replace.main("Foo", "Bar", "-start:!")</code>. This can be useful for debugging, among other things.
Hi guys, can anyone comment this please?
I want to write scripts which import and call other scripts with command line Unicode parameters passed them as normal parameters, but _decodeArg in handleArgs() stops me. More details in the previous.
2012/2/14 Bináris wikiposta@gmail.com
I don't want to spoil handleArgs() and I know this is an unusual use of it. But is it possible in some way to pass a parameter to it that tells _decodeArg to shut up? Or is there another correct way of passing Unicode parameters from within a script?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Again the question: "any unicode-expert out there?"
Meanwhile I just can tell you how I handle that - I use a script that sets (sys.argv[1:]) to the value I want. This can be considered as VERY hacky (and bad style) but works. I assume that should also work with Cyrillic unicode, but not sure... Another question to me (when lookin at '_decodeArg') is; have you tried to add another encoding there? To me it seems as your console has to use another one in order to be able to handle Cyrillic letters. I think we have ti fix '_decodeArg'.
(sorry - this are just question and is no real help)
Greetings DrTrigon
On 18.02.2012 12:42, Bináris wrote:
Hi guys, can anyone comment this please?
I want to write scripts which import and call other scripts with command line Unicode parameters passed them as normal parameters, but _decodeArg in handleArgs() stops me. More details in the previous.
2012/2/14 Bináris <wikiposta@gmail.com mailto:wikiposta@gmail.com>
I don't want to spoil handleArgs() and I know this is an unusual use of it. But is it possible in some way to pass a parameter to it that tells _decodeArg to shut up? Or is there another correct way of passing Unicode parameters from within a script?
-- Bináris
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
2012/2/18 Dr. Trigon dr.trigon@surfeu.ch
Meanwhile I just can tell you how I handle that - I use a script that sets (sys.argv[1:]) to the value I want.
Could you show me an example, how do you mean that?
Another question to me (when lookin at '_decodeArg') is; have you tried to add another encoding there? To me it seems as your console has to use another one in order to be able to handle Cyrillic letters. I think we have ti fix '_decodeArg'.
Well, this is not about Cyrillic; Cyrillic was just the first reason to begin to deal with the question. I don't know how to set the consol to Cyrillic under a Hungarian Windows 7, and how to keep my own caharcters meanwhile, too. Is that possible at all?
The trick I want to do is to write a Python script that uses other scripts. All my Python scripts are encoded as Unicode without BOM, that's safe and uniform. From that point I have to pass Hungarian (or even German) letters as Unicode.
Is it perhaps possible to encode my Unicode parameters with some function in the way that _decodeArg gives them back in their original state to handleArgs? This would be the simpliest solution.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 18.02.2012 14:00, Bináris wrote:
2012/2/18 Dr. Trigon <dr.trigon@surfeu.ch mailto:dr.trigon@surfeu.ch>
Meanwhile I just can tell you how I handle that - I use a script that sets (sys.argv[1:]) to the value I want.
Could you show me an example, how do you mean that?
https://fisheye.toolserver.org/browse/drtrigon/pywikipedia/bot_control.py?hb...
and then later in the used bot scripts 'pywikibot.handleArgs()' is called to extract them.
Another question to me (when lookin at '_decodeArg') is; have you tried to add another encoding there? To me it seems as your console has to use another one in order to be able to handle Cyrillic letters. I think we have ti fix '_decodeArg'.
Well, this is not about Cyrillic; Cyrillic was just the first reason to begin to deal with the question. I don't know how to set the consol to Cyrillic under a Hungarian Windows 7, and how to keep my own caharcters meanwhile, too. Is that possible at all?
The trick I want to do is to write a Python script that uses other scripts. All my Python scripts are encoded as Unicode without BOM, that's safe and uniform. From that point I have to pass Hungarian (or even German) letters as Unicode.
The script mentioned above 'bot_control.py' does exactly that. It is the script that gets called from e.g. cronjob and then executes the given bot script in order. I wrote this basically because of timing since I wanted to be sure the first bot was done before running the second. But be warned the code is somehow ugly... ;) This is also because it implements additional things like 'logging' module for log and others which should be implemented in pywikipedia instead of patched together with a "dirty" script... (as mentioned earlier on this list by others) But I would be happy if we could adopt it to fit your needs too.
On 14 February 2012 00:46, Bináris wikiposta@gmail.com wrote:
I went back to this conversation with Russell, and tried to use it in an other way. I have console encoding problems with this command with Cyrillic letters: replace.py -catr:Венгрия . @ -lang:ru -excepttext:"[[hu:" -save:magyarok.txt -always One way is to urlencode the Russian category. Other way is to insert it into a script. (DOS batch files won't work, I already tried.) So what I did: import replace replace.main(u'-catr:Венгрия', '.', '@', '-lang:ru', '-excepttext:"[[hu:"', '-save:magyarok.txt') This results in an error message: File "C:\Pywikipedia\replace.py", line 582, in main for arg in pywikibot.handleArgs(*args): File "C:\Pywikipedia\wikipedia.py", line 7795, in handleArgs arg = _decodeArg(arg) File "C:\Pywikipedia\wikipedia.py", line 7767, in _decodeArg return unicode(arg, config.console_encoding) TypeError: decoding Unicode is not supported If I omit u from before -catr, no error is thrown, but the name is erroneously decoded. Now comes the tick! I went to line 7795 of current wikipedia.py (r9894) as shown above, and commented it out. Now my script runs perfectly! I love it!
What happens is the following. In the context of line 7767, arg= u'-catr:Венгрия' (type=Unicode). The line then tries to *decode* a Unicode string, which makes no sense: you can only decode a str representation.
The sensible solution would be to add a check, for instance something like
return arg is isinstance(arg, unicode) else unicode(arg, config.console_encoding)
(which mght not work for python 2.4, though, so having a normal if/else might be preferrable).
Merlijn
Thanks for both of you! I committed Merlijn's proposal in r9914, it seems to work nicely. New horizonts open in front of me. :-)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
http://www.mediawiki.org/wiki/Special:Code/pywikipedia/9914
Thank you and greetings!
On 14.02.2012 00:46, Bináris wrote:
Was: Parameters of handleArgs
I went back to this conversation with Russell, and tried to use it in an other way. I have console encoding problems with this command with Cyrillic letters: replace.py -catr:Венгрия . @ -lang:ru -excepttext:"[[hu:" -save:magyarok.txt -always One way is to urlencode the Russian category. Other way is to insert it into a script. (DOS batch files won't work, I already tried.) So what I did: import replace replace.main(u'-catr:Венгрия', '.', '@', '-lang:ru', '-excepttext:"[[hu:"', '-save:magyarok.txt') This results in an error message: File "C:\Pywikipedia\replace.py", line 582, in main for arg in pywikibot.handleArgs(*args): File "C:\Pywikipedia\wikipedia.py", line 7795, in handleArgs arg = _decodeArg(arg) File "C:\Pywikipedia\wikipedia.py", line 7767, in _decodeArg return unicode(arg, config.console_encoding) TypeError: decoding Unicode is not supported If I omit u from before -catr, no error is thrown, but the name is erroneously decoded. Now comes the tick! I went to line 7795 of current wikipedia.py (r9894) as shown above, and commented it out. Now my script runs perfectly! I love it!
I don't want to spoil handleArgs() and I know this is an unusual use of it. But is it possible in some way to pass a parameter to it that tells _decodeArg to shut up? Or is there another correct way of passing Unicode parameters from within a script?
2011/5/18 Russell Blau <russblau@imapmail.org mailto:russblau@imapmail.org>
Bináris said:
I see in a couple of bots this construction:
def main(*args): for arg in pywikibot.handleArgs(*args): etc.
Now, if I write instead of this def main(): for arg in pywikibot.handleArgs(): etc. the result seems to be just the same. I tried with valid global
and with
unique parameters as well. So, what is the difference? I know the theory that * means a variable width argument list, but
if I
omit it, the behaviour does > not change.
The behavior is the same if you run the script from the command line.
However, using (*args) also allows you the option of running the script from inside the Python interactive interpreter; for example, if you were running "replace.py Foo Bar -start:!", then you could "import replace" in the interpreter and run <code>replace.main("Foo", "Bar", "-start:!")</code>. This can be useful for debugging, among other things.
-- Bináris
_______________________________________________ Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
pywikipedia-l@lists.wikimedia.org