I have a problem that relates to character encodings. Certain things relating to this
problem are beyond my control so I need to know how to change the end situation not how to
do it better from the start.
I have a situation where I work on a linux system on my own desktop loading a wiki onto an
online system that runs linux.
The processing involves inserting text directly into the middle of the wikimedia dumps and
then loading the dumps using ImportDump.php. This works fine except for Accented
characters in foreign words, ( and there are rather a lot of these in total). These appear
as gibberish. The insertion method is beyond my personal control, so I am stuck with it.
It is obviously a character encoding problem but I have tried using Iconv with no
success.
I know this is an odd way to do things, and I would ideally not do things like this, but I
have no choice.
First of all am I correct in assuming that character encodings should be in UTF-8? If not
what should they be in?
A site giving the details of how the different character encodings work would be a start
also if someone knows of one. I can at last resort write something to change the encodings
myself.
Neil Jones
Neil(a)nwjones.demon.co.uk