[Toolserver-l] Character encoding after selecting form database

Maciej Jaros egil at wp.pl
Fri Nov 26 03:21:16 UTC 2010


@2010-11-26 04:00, Maciej Jaros:
> @2010-11-26 03:33, MZMcBride:
>> Maciej Jaros wrote:
>>>    There seems to be a problem with character encoding in (at least) the Polish
>>> Wikipedia database. At first I thought it was the problem with my script but
>>> phpMyAdmin and even shell mysql also shows weird characters.
>>>
>>>    See for example:
>>>    SELECT page_title FROM page WHERE page_id = 2117937
>> Looks fine to me:
>>
>> mzmcbride at nightshade:~$ mysql -hsql-s2 -e 'SELECT page_title FROM page WHERE
>> page_id = 2117937;' plwiki_p;
>> +---------------------+
>> | page_title          |
>> +---------------------+
>> | Vladimír_Železný |
>> +---------------------+
>>
>>>    Or a lot more here:
>>>    http://toolserver.org/~eccenux/dna/index.php?D=2010-10-10
>> The database is set to use latin-1 encoding / collation, but the text of
>> page titles is stored in the database as byte strings. In this specific
>> case, it looks like your tool is mishandling the data.
>>
>> In general, you want to make sure that the web server is outputting
>> "Content-Type: text/html;charset=utf-8" in its headers. You also want to
>> make sure that your browser is set to use UTF-8 encoding when viewing pages
>> (which it will usually properly auto-detect if the headers are correct) and
>> that the tool you've written properly encodes the byte strings as UTF-8.
>>
>> When it's a choice between the database being corrupt and user error, the
>> odds favor user error. ;-)
> Database corruption can be a user error too ;-).
>
> I'm seeing the same result in three tools this is not just something in
> my script.
>
> PuTTy on Windows 7 gives me:
>
> eccenux at nightshade:~$ mysql -hsql-s2 -e 'SELECT page_title FROM page WHERE
>   >  page_id = 2117937;' plwiki_p;
> +---------------------+
> | page_title          |
> +---------------------+
> | Vladimír_Železný |
> +---------------------+
>
> And the same with my script and phpMyAdmin that is on with default
> settings... Is this something with my profile settings or what?
>

Strange. I wasn't able to make phpMyAdmin act as expected otherwise then 
casting page_title as binary. Shell act the same to me but I guess it 
might be because my system doesn't use latin1.

I guess using this in my script was NOT a good idea:
PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
:-).

Hope it helps someone else.

Best,
Nux.




More information about the Toolserver-l mailing list