Nicolas Dumazet wrote:
"Hopefully, most of the readers here understand that text should be decoded first before passing it to put"
That's the basic of Python Unicode programming. The text should be decoded from which encoding?
If a user starts searching for UnicodeDecodeError, s/he should be enlightened :)
Here is the idea, considering the line : str.encode(self.site().encoding())
- If str is unicode, that line can fail with UnicodeEncodeError. (
e.g. u'人物'.encode('latin-1') ). I want to raise an user friendly error if that happens.
UnicodeEncodeError is precise enough for this. This means "I can convert your string to the encoding your MediaWiki installation uses".
- If str is a string, this line is exactly equivalent to
str.decode('ascii').encode(self.site().encoding()), or shorter, str.decode().encode(self.site().encoding())
And we don't need to play with ascii strings at all. "encode" on this will do no harm.
** str.decode() can fail with UnicodeDecodeError, ( e.g. 'é'.decode() ) - that's what is being tested line 1290, "arg.decode()" . I want to raise another error here, different from the first one.
This is the case where UnicodeDecodeError should be raised. UnicodeDecodeError != UnicodeEncodeError so they are different.
** encode( ) cannot fail with UnicodeEncodeError since if we call it, it means that decode succeeded, i.e. we are encoding ascii characters>
That is redundant.
My implementation was wrong and confused, true. But is it such a bad idea to throw nice errors that explain in one line what's happening ? I really don't like the idea of having our framework users forced to dig into OUR code to understand what was wrong with THEIR code...
How is r5807 ?
Fine, except for catching "UnicodeDecodeError" and providing a "user friendly" text message. We don't need this - see below
If you are more interested in learning about exception theory, please search for "checked exceptions", for example here:
http://radio.weblogs.com/0122027/stories/2003/04/01/JavasCheckedExceptionsWe...
http://web.archive.org/web/20080205032558/http://www.mindview.net/Etc/Discus...
What you are trying to do is to "tunnel" a real exception in a PageNotSaved exception - this is just not necessary in Python.
Well, in the same function, _putPage, we nearly __only__ throw these errors : PageNotSaved several times, and derivated SpamfilterError, EditConflict and LongPageError. The only non-PageNotSaved is LockedPage. Note that I really don't mind what you use as an Error, as long as the message is self-explanatory :)
This is not Java, and we don't do checked exceptions. Nobody forbids _putPage to fail with any exception. Indeed, it should fail with UnicodeDecodeError or UnicodeEncodeError.
the only good system is the old system. If needed, catch the UnicodeDecodeException and raise a PageNotSaved exception in the /save/ function.
Not sure, to me _putPage is our only save function, called from both put and async_put. Do you mean catching the errors above, in put and async_put ? If so, I would be quite reluctant to the idea of duplicating the exception catch while catching them once for all in _putPage solves the problem.
You don't need to catch or declare exceptions! Suppose I have my program:
try: page.put(somestring) except UnicodeDecodeError, e: if my_application_default_encoding: page.put(somestring.encode(my_application_default_encoding)) else: raise except UnicodeEncodeError, e: print u"""Twoja instalacja MediaWiki nie korzysta z utf-8!""".encode("utf-8")
I know _precisely_ what happened. What if I got PageNotSaved? I need to analyze some English-language string instead. What if I want to be user friendly and deliver nicely formatted localized error messages to the user? I can't really precisely differentiate between those situations (unicode error or something completely unrelated).
Actually, when I first started using UTF-8 with pywikipedia, I got "UnicodeDecodeError" in one of my first attempts. Then I new _exactly_ what was the problem - I tried ASCII strings instead of Unicode. I had to dig a bit to learn about Unicode in Python of course.
The text message might be more readable, but if I google for "UnicodeDecodeError" it will much more helpful than using PageNotSaved with a message.
--Marcin