[Mediawiki-l] Character encoding problems

wiki at avenarius.sk wiki at avenarius.sk
Fri Aug 4 09:43:52 UTC 2006


On Friday, 4th August 2006 at 02:11:57 (GMT -0700), Webmaster Comunitatea Româna wrote:

> See my email attached, I run into similar issue. It's likely that your
> character set is latin1 and that you need to use  mysqldump using
> --default-character-set=latin1 options when creating the SQL dump. I also
> assumed that my charset is UTF-8 just to discover that it's not true.

It's certainly weird. I'm on Windows using the very fine EditPlus text
editor (makes converting files from/to UTF-8 a snap), and when I load
a WikiMedia SQL dump into EditPlus, it shows this:

http://avenarius.sk/misc/sql-dump.gif

Note the file claims to be encoded in UTF-8, but the characters evidently
are not. See several occurrences of "u8c2b4" that apparently represent
the curly apostrophe in ISO-8859-1 encoded texts.

In contrast, when loading an SQL dump produced by phpBB's built-in backup
manager in EditPlus, the curly apostrophes would dispay as curly apostrophes,
all accented letters would be WYSIWYG, and so on. Even though the status
line would show the same "Unix,U8" flag as it does for MediaWiki dumps.

(PS: The email attached to your message must have been stripped by Mailman.)
 
-- 
Yours,
Alex.
[processed by "The Bat!", Version 3.80.06]




More information about the MediaWiki-l mailing list