Nicolas Dumazet wrote:
yep, thanks; it should be better by r5804 :)
Well, but I don't understand the _need_ for this change at all:\
--- trunk/pywikipedia/wikipedia.py 2008-08-17 14:39:50 UTC (rev 5803) +++ trunk/pywikipedia/wikipedia.py 2008-08-17 15:14:02 UTC (rev 5804) @@ -1294,14 +1294,16 @@ host = self.site().hostname() # Get the address of the page on that host. address = self.site().put_address(self.urlname()) - if not isinstance(comment, unicode): - raise ValueError("An unicode edit comment is expected as an argument") # Use the proper encoding for the comment - encodedComment = comment.encode(self.site().encoding()) - if not isinstance(text, unicode): - raise ValueError("An unicode wikitext is expected as an argument") + try: + encodedComment = comment.encode(self.site().encoding()) + except UnicodeDecodeError: + raise ValueError("An ascii string or unicode edit comment is expected as an argument") # Encode the text into the right encoding for the wiki - encodedText = text.encode(self.site().encoding()) + try: + encodedText = text.encode(self.site().encoding()) + except UnicodeDecodeError: + raise ValueError("An ascii string or unicode wikitext is expected as an argument") predata = { 'wpSave': '1', 'wpSummary': encodedComment,
Why assume that the string given is provided in the site's encoding?
The "site" encoding - the MediaWiki site you talk to over HTTP is different then Python or your script encoding.
If I have input received from external source (file, database, HTTP) I encode it manually. Say, in this example, I got something in UCS-2:
my_string = received_string.decode("ucs-2") # my_string is now unicode string mypage.put(my_string) # works
Why silently assume that all strings provided by the script author are in MediaWiki site encoding?
A plain ASCII strings should be passed unconverted, and all Unicode strings should be python "unicode" objects.
--Marcin