[Toolserver-l] Character encoding after selecting form database

Jeremy Baron jeremy at tuxmachine.com
Fri Nov 26 03:19:35 UTC 2010


2010/11/25 Maciej Jaros <egil at wp.pl>:
> @2010-11-26 03:33, MZMcBride:
>> Maciej Jaros wrote:
>>>   There seems to be a problem with character encoding in (at least) the Polish
>>> Wikipedia database. At first I thought it was the problem with my script but
>>> phpMyAdmin and even shell mysql also shows weird characters.
>>>
>>>   See for example:
>>>   SELECT page_title FROM page WHERE page_id = 2117937
>> Looks fine to me:
>>
>> mzmcbride at nightshade:~$ mysql -hsql-s2 -e 'SELECT page_title FROM page WHERE
>> page_id = 2117937;' plwiki_p;
>> +---------------------+
>> | page_title          |
>> +---------------------+
>> | Vladimír_Železný |
>> +---------------------+
>>
>>>   Or a lot more here:
>>>   http://toolserver.org/~eccenux/dna/index.php?D=2010-10-10
>> The database is set to use latin-1 encoding / collation, but the text of
>> page titles is stored in the database as byte strings. In this specific
>> case, it looks like your tool is mishandling the data.
>>
>> In general, you want to make sure that the web server is outputting
>> "Content-Type: text/html;charset=utf-8" in its headers. You also want to
>> make sure that your browser is set to use UTF-8 encoding when viewing pages
>> (which it will usually properly auto-detect if the headers are correct) and
>> that the tool you've written properly encodes the byte strings as UTF-8.
>>
>> When it's a choice between the database being corrupt and user error, the
>> odds favor user error. ;-)
>
> Database corruption can be a user error too ;-).
>
> I'm seeing the same result in three tools this is not just something in
> my script.
>
> PuTTy on Windows 7 gives me:
>
> eccenux at nightshade:~$ mysql -hsql-s2 -e 'SELECT page_title FROM page WHERE
>  > page_id = 2117937;' plwiki_p;
> +---------------------+
> | page_title          |
> +---------------------+
> | Vladimír_Železný |
> +---------------------+
>
> And the same with my script and phpMyAdmin that is on with default
> settings... Is this something with my profile settings or what?
>
> Best,
> Nux.
>

Have you set putty or your web browser (when viewing phpMyAdmin) to use UTF-8?

-Jeremy



More information about the Toolserver-l mailing list