I'm using the Zend Framework HTTP client library to make an edit request to the MediaWiki API, as follows:
<?php // Instantiate the client object. require_once 'Zend/Http/Client.php'; $client = new Zend_Http_Client($apiUrl);
// Get necessary information. $title = 'Testpage'; $text = 'Æneas Mackintosh'; $basetimestamp = '2009-09-15T15:45:50Z'; $token = '6c9600319ea3a1188d4542cd3e1443c7+';
// Edit the page. $client->setParameterPost('action', 'edit'); $client->setParameterPost('title', $title); $client->setParameterPost('text', $text); $client->setParameterPost('basetimestamp', $basetimestamp); $client->setParameterPost('token', $token);
// Make the request. $client->request('POST'); ?>
After editing, the resulting wiki page should contain "Æneas Mackintosh" (note the AE ligature); instead it contains "�neas Mackintosh". I suspect that this is a MediaWiki API bug, since the POST request is what appears to be correctly formatted:
action=edit&title=Testpage&text=%C6neas+Mackintosh&basetimestamp=2009-09-15T15%3A45%3A50Z&token=6c9600319ea3a1188d4542cd3e1443c7%2B%5C
Has any one else had issues with special characters? Any solutions?
My configuration: MediaWiki 1.15.1 Zend Framework 1.9.2 PHP 5.2.6-3ubuntu4.2
Thanks, Jim
2009/9/15 Beau beau@adres.pl:
Hello.
James M Safley wrote:
Has any one else had issues with special characters? Any solutions?
What encoding did you set up for the wiki?
The 'Æ' character should be encoded in UTF-8 as %C3%86, %C6 is in iso-8859-1.
Looks like the Zend framework may be misencoding stuff here.
Also, please try the same request on Wikipedia and see if it gets it right.
Roan Kattouw (Catrope)
Roan Kattouw wrote:
2009/9/15 Beau beau@adres.pl:
Hello.
James M Safley wrote:
Has any one else had issues with special characters? Any solutions?
What encoding did you set up for the wiki?
The 'Æ' character should be encoded in UTF-8 as %C3%86, %C6 is in iso-8859-1.
Looks like the Zend framework may be misencoding stuff here.
Also, please try the same request on Wikipedia and see if it gets it right.
Roan Kattouw (Catrope)
I think the file was edited as iso-8859-1 instead of utf. The way to fix it would be to choose a different option on save, or using another editor.
Beau wrote:
What encoding did you set up for the wiki?
The database character set is MySQL 4.1/5.0 binary.
The 'Æ' character should be encoded in UTF-8 as %C3%86, %C6 is in iso-8859-1.
Then it seems that Zend Framework's HTTP client library or PHP's http_build_query() function is incorrectly encoding the string, since the 'Æ' I'm passing is indeed UTF-8.
It is apparent that this is NOT a MediaWiki API bug, but any further suggestions are appreciated.
Perhaps MySQL is defaulting to Latin1? It supports Unicode, but it isn't quite as easy as one might hope. I found this to be helpful:
http://www.saiweb.co.uk/mysql/mysql-forcing-utf-8-compliance-for-all-connect...
Eric W. Brown ericb@metacafe.com [[User:Eric94043]]
-----Original Message----- From: mediawiki-api-bounces@lists.wikimedia.org [mailto:mediawiki-api-bounces@lists.wikimedia.org] On Behalf Of James M Safley Sent: Tuesday, September 15, 2009 1:07 PM To: MediaWiki API announcements & discussion Subject: Re: [Mediawiki-api] Problem with special characters upon edit.
Beau wrote:
What encoding did you set up for the wiki?
The database character set is MySQL 4.1/5.0 binary.
The 'Æ' character should be encoded in UTF-8 as %C3%86, %C6 is in iso-8859-1.
Then it seems that Zend Framework's HTTP client library or PHP's http_build_query() function is incorrectly encoding the string, since the 'Æ' I'm passing is indeed UTF-8.
It is apparent that this is NOT a MediaWiki API bug, but any further suggestions are appreciated.
_______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
James M Safley wrote:
The 'Æ' character should be encoded in UTF-8 as %C3%86, %C6 is in iso-8859-1.
Then it seems that Zend Framework's HTTP client library or PHP's http_build_query() function is incorrectly encoding the string,
since the 'Æ' I'm passing is indeed UTF-8.
Are you sure? What's the output of base64_encode('Æ') ?
Eric Brown wrote:
Perhaps MySQL is defaulting to Latin1? It supports Unicode, but it isn't quite as easy as one might hope.
Mysql is not involved on this. He's interfacing with mediawiki using http.
Platonides wrote:
since the 'Æ' I'm passing is indeed UTF-8.
Are you sure?
Oops. This was an oversight on my part; and in the interests of helping others with a similar problem, here's the plain-and-simple solution:
Always remember to set the appropriate Content-Type header on your response. In this case:
header('Content-Type: text/html; charset=utf-8');
What an embarrassing oversight! Thank you all for your help.
mediawiki-api@lists.wikimedia.org