See my email attached, I run into similar issue. It's likely that your character set is latin1 and that you need to use mysqldump using --default-character-set=latin1 options when creating the SQL dump. I also assumed that my charset is UTF-8 just to discover that it's not true.
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of neil@nwjones.demon.co.uk Sent: Friday, August 04, 2006 1:54 AM To: mediawiki-l@Wikimedia.org Subject: [Mediawiki-l] Character encoding problems
I have a problem that relates to character encodings. Certain things relating to this problem are beyond my control so I need to know how to change the end situation not how to do it better from the start.
I have a situation where I work on a linux system on my own desktop loading a wiki onto an online system that runs linux.
The processing involves inserting text directly into the middle of the wikimedia dumps and then loading the dumps using ImportDump.php. This works fine except for Accented characters in foreign words, ( and there are rather a lot of these in total). These appear as gibberish. The insertion method is beyond my personal control, so I am stuck with it. It is obviously a character encoding problem but I have tried using Iconv with no success.
I know this is an odd way to do things, and I would ideally not do things like this, but I have no choice.
First of all am I correct in assuming that character encodings should be in UTF-8? If not what should they be in?
A site giving the details of how the different character encodings work would be a start also if someone knows of one. I can at last resort write something to change the encodings myself.
Neil Jones Neil@nwjones.demon.co.uk
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l