[Mediawiki-l] Character encoding problems

neil at nwjones.demon.co.uk neil at nwjones.demon.co.uk
Fri Aug 4 08:54:23 UTC 2006


I have a problem that relates to character encodings. Certain things relating to this problem are beyond my control so I need to know how to change the end situation not how to do it better from the start.


I have a situation where I work on a linux system on my own desktop loading a wiki onto an online system that runs linux.

The processing involves inserting text directly into the middle of the wikimedia dumps and then loading the dumps using ImportDump.php. This works fine except for Accented characters in foreign words, ( and there are rather a lot of these in total). These appear as gibberish. The insertion method is beyond my personal control, so I am stuck with it. 
It is obviously a character encoding problem but I have tried using Iconv with no success.

I know this is an odd way to do things, and I would ideally not do things like this, but I have no choice.

First of all am I correct in assuming that character encodings should be in UTF-8? If not what should they be in?

A site giving the details of how the different character encodings work would be a start also if someone knows of one. I can at last resort write something to change the encodings myself.


Neil Jones
Neil at nwjones.demon.co.uk






More information about the MediaWiki-l mailing list