Nicolas Dumazet wrote:
"Hopefully, most of the readers here understand
that text should be
decoded first before passing it to put"
That's the basic of Python Unicode programming. The text should be
decoded from which encoding?
If a user starts searching for UnicodeDecodeError, s/he should be
enlightened :)
Here is the idea, considering the line :
str.encode(self.site().encoding())
* If str is unicode, that line can fail with UnicodeEncodeError. (
e.g. u'人物'.encode('latin-1') ). I want to raise an user friendly error
if that happens.
UnicodeEncodeError is precise enough for this. This means "I can convert
your string to the encoding your MediaWiki installation uses".
* If str is a string, this line is exactly equivalent
to
str.decode('ascii').encode(self.site().encoding()), or shorter,
str.decode().encode(self.site().encoding())
And we don't need to play with ascii strings at all. "encode" on this
will do no harm.
** str.decode() can fail with UnicodeDecodeError, (
e.g. 'é'.decode()
) - that's what is being tested line 1290, "arg.decode()" . I want to
raise another error here, different from the first one.
This is the case where UnicodeDecodeError should be raised.
UnicodeDecodeError != UnicodeEncodeError so they are different.
** encode( ) cannot fail with UnicodeEncodeError since
if we call it,
it means that decode succeeded, i.e. we are encoding ascii characters>
That is redundant.
My implementation was wrong and confused, true. But is
it such a bad
idea to throw nice errors that explain in one line what's happening ?
I really don't like the idea of having our framework users forced to
dig into OUR code to understand what was wrong with THEIR code...
How is r5807 ?
Fine, except for catching "UnicodeDecodeError" and providing a "user
friendly" text message. We don't need this - see below
If you are more interested in learning about exception theory, please
search for "checked exceptions", for example here:
http://radio.weblogs.com/0122027/stories/2003/04/01/JavasCheckedExceptionsW…
http://web.archive.org/web/20080205032558/http://www.mindview.net/Etc/Discu…
What you are trying to do is to "tunnel" a real exception in a
PageNotSaved exception - this is just not necessary in Python.
Well, in the same function, _putPage, we nearly
__only__ throw these
errors : PageNotSaved several times, and derivated SpamfilterError,
EditConflict and LongPageError. The only non-PageNotSaved is
LockedPage.
Note that I really don't mind what you use as an Error, as long as the
message is self-explanatory :)
This is not Java, and we don't do checked exceptions. Nobody forbids
_putPage to fail with any exception. Indeed, it should fail with
UnicodeDecodeError or UnicodeEncodeError.
the only good
system is the old system. If needed, catch the
UnicodeDecodeException and raise a PageNotSaved exception in the /save/
function.
Not sure, to me _putPage is our only save function, called from both
put and async_put. Do you mean catching the errors above, in put and
async_put ?
If so, I would be quite reluctant to the idea of duplicating the
exception catch while catching them once for all in _putPage solves
the problem.
You don't need to catch or declare exceptions! Suppose I have my program:
try:
page.put(somestring)
except UnicodeDecodeError, e:
if my_application_default_encoding:
page.put(somestring.encode(my_application_default_encoding))
else:
raise
except UnicodeEncodeError, e:
print u"""Twoja instalacja MediaWiki nie korzysta z
utf-8!""".encode("utf-8")
I know _precisely_ what happened. What if I got PageNotSaved? I need to
analyze some English-language string instead. What if I want to be user
friendly and deliver nicely formatted localized error messages to the
user? I can't really precisely differentiate between those situations
(unicode error or something completely unrelated).
Actually, when I first started using UTF-8 with pywikipedia, I got
"UnicodeDecodeError" in one of my first attempts. Then I new _exactly_
what was the problem - I tried ASCII strings instead of Unicode.
I had to dig a bit to learn about Unicode in Python of course.
The text message might be more readable, but if I google for
"UnicodeDecodeError" it will much more helpful than using PageNotSaved
with a message.
--Marcin