Re: [Pywikipedia-l] SVN: [5802] trunk/pywikipedia/wikipedia.py

17 Aug 2008

Nicolas Dumazet wrote:
...
yep, thanks; it should be better by r5804 :)
Well, but I don't understand the _need_ for this change at all:\

--- trunk/pywikipedia/wikipedia.py	2008-08-17 14:39:50 UTC (rev 5803)
+++ trunk/pywikipedia/wikipedia.py	2008-08-17 15:14:02 UTC (rev 5804)
@@ -1294,14 +1294,16 @@
          host = self.site().hostname()
          # Get the address of the page on that host.
          address = self.site().put_address(self.urlname())
-        if not isinstance(comment, unicode):
-            raise ValueError("An unicode edit comment is expected as an 
argument")
          # Use the proper encoding for the comment
-        encodedComment = comment.encode(self.site().encoding())
-        if not isinstance(text, unicode):
-            raise ValueError("An unicode wikitext is expected as an 
argument")
+        try:
+            encodedComment = comment.encode(self.site().encoding())
+        except UnicodeDecodeError:
+            raise ValueError("An ascii string or unicode edit comment 
is expected as an argument")
          # Encode the text into the right encoding for the wiki
-        encodedText = text.encode(self.site().encoding())
+        try:
+            encodedText = text.encode(self.site().encoding())
+        except UnicodeDecodeError:
+            raise ValueError("An ascii string or unicode wikitext is 
expected as an argument")
          predata = {
              'wpSave': '1',
              'wpSummary': encodedComment,
Why assume that the string given is provided in the site's encoding?
The "site" encoding - the MediaWiki site you talk to over HTTP is 
different then Python or your script encoding.
If I have input received from external source (file, database, HTTP) I 
encode it manually. Say, in this example, I got something in UCS-2:
my_string = received_string.decode("ucs-2")
# my_string is now unicode string
mypage.put(my_string)
# works
Why silently assume that all strings provided by the script author are 
in MediaWiki site encoding?
A plain ASCII strings should be passed unconverted, and all Unicode 
strings should be python "unicode" objects.
--Marcin

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] SVN: [5802] trunk/pywikipedia/wikipedia.py