The reason why we we are encoding our strings/unicode objects to the site's encoding is obvious : we don't trust ourselves to think to encode properly each string before each put. And failing to properly encode a string would result in a garbage write. This has always been like this, my recent changes did not introduce that behavior.
I introduced the unicode check because a user was not understanding why a UnicodeDecodeError was triggered in the put of : text = open('file_in_utf8_with_non-ascii_chars').read() page.put(text) Throwing a UnicodeDecodeError is not user-friendly at all. Hopefully, most of the readers here understand that text should be decoded first before passing it to put; but why aren't we throwing a nice error instead of this UnicodeDecodeError, forcing the lambda user to check into our code to understand what's happening ?
That's the reason of that change.
my_string = received_string.decode("ucs-2") # my_string is now unicode string mypage.put(my_string) # works
Not always, it triggers a UnicodeDecodeError if the unicode string is not convertible into the MediaWiki site's encoding.
Why silently assume that all strings provided by the script author are in MediaWiki site encoding?
We're not assuming this, we've always been encoding the strings to the MediaWiki site encoding... Oo
Now look at revisions 5805-5806, and tell me what you think of the implemented behavior : * accept only plain strings we can reliably decode (i.e ascii strings) * accept unicode strings * catch UnicodeDecodeErrors if the unicode string can't be encoded to the site's encoding to warn the user in a friendly fashion