Re: [Pywikipedia-l] SVN: [5802] trunk/pywikipedia/wikipedia.py

18 Aug 2008

      Nicolas Dumazet wrote:
...
"Hopefully, most of the readers here understand that text should be
decoded first before passing it to put"
That's the basic of Python Unicode programming. The text should be 
decoded from which encoding?
If a user starts searching for UnicodeDecodeError, s/he should be 
enlightened :)
...
Here is the idea, considering the line :
str.encode(self.site().encoding())

If str is unicode, that line can fail with UnicodeEncodeError. (

e.g. u'人物'.encode('latin-1') ). I want to raise an user friendly error
if that happens.
UnicodeEncodeError is precise enough for this. This means "I can convert 
your string to the encoding your MediaWiki installation uses".
...

If str is a string, this line is exactly equivalent to

str.decode('ascii').encode(self.site().encoding()), or shorter,
str.decode().encode(self.site().encoding())
And we don't need to play with ascii strings at all. "encode" on this 
will do no harm.
...
** str.decode() can fail with UnicodeDecodeError, ( e.g. 'é'.decode()
) - that's what is being tested line 1290, "arg.decode()" . I want to
raise another error here, different from the first one.
This is the case where UnicodeDecodeError should be raised. 
UnicodeDecodeError != UnicodeEncodeError so they are different.
...
** encode( ) cannot fail with UnicodeEncodeError since if we call it,
it means that decode succeeded, i.e. we are encoding ascii characters>
That is redundant.
...
My implementation was wrong and confused, true. But is it such a bad
idea to throw nice errors that explain in one line what's happening ?
I really don't like the idea of having our framework users forced to
dig into OUR code to understand what was wrong with THEIR code...
How is r5807 ?
Fine, except for catching "UnicodeDecodeError" and providing a "user 
friendly" text message. We don't need this - see below
If you are more interested in learning about exception theory, please 
search for "checked exceptions", for example here:
http://radio.weblogs.com/0122027/stories/2003/04/01/JavasCheckedExceptionsWe...
http://web.archive.org/web/20080205032558/http://www.mindview.net/Etc/Discus...
What you are trying to do is to "tunnel" a real exception in a 
PageNotSaved exception - this is just not necessary in Python.
...
Well, in the same function, _putPage, we nearly __only__ throw these
errors : PageNotSaved several times, and derivated SpamfilterError,
EditConflict and LongPageError. The only non-PageNotSaved is
LockedPage.
Note that I really don't mind what you use as an Error, as long as the
message is self-explanatory :)
This is not Java, and we don't do checked exceptions. Nobody forbids 
_putPage to fail with any exception. Indeed, it should fail with 
UnicodeDecodeError or UnicodeEncodeError.
...
...
the only good system is the old system. If needed, catch the
UnicodeDecodeException and raise a PageNotSaved exception in the /save/
function.
Not sure, to me _putPage is our only save function, called from both
put and async_put. Do you mean catching the errors above, in put and
async_put ?
If so, I would be quite reluctant to the idea of duplicating the
exception catch while catching them once for all in _putPage solves
the problem.
You don't need to catch or declare exceptions! Suppose I have my program:
try:
   page.put(somestring)
except UnicodeDecodeError, e:
   if my_application_default_encoding:
      page.put(somestring.encode(my_application_default_encoding))
   else:
      raise
except UnicodeEncodeError, e:
   print u"""Twoja instalacja MediaWiki nie korzysta z 
utf-8!""".encode("utf-8")
I know _precisely_ what happened. What if I got PageNotSaved? I need to 
analyze some English-language string instead. What if I want to be user 
friendly and deliver nicely formatted localized error messages to the 
user? I can't really precisely differentiate between those situations 
(unicode error or something completely unrelated).
Actually, when I first started using UTF-8 with pywikipedia, I got 
"UnicodeDecodeError" in one of my first attempts. Then I new _exactly_ 
what was the problem - I tried ASCII strings instead of Unicode.
I had to dig a bit to learn about Unicode in Python of course.
The text message might be more readable, but if I google for 
"UnicodeDecodeError" it will much more helpful than using PageNotSaved 
with a message.
--Marcin

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] SVN: [5802] trunk/pywikipedia/wikipedia.py