I have a problem that relates to character encodings. Certain things relating to this problem are beyond my control so I need to know how to change the end situation not how to do it better from the start.
I have a situation where I work on a linux system on my own desktop loading a wiki onto an online system that runs linux.
The processing involves inserting text directly into the middle of the wikimedia dumps and then loading the dumps using ImportDump.php. This works fine except for Accented characters in foreign words, ( and there are rather a lot of these in total). These appear as gibberish. The insertion method is beyond my personal control, so I am stuck with it. It is obviously a character encoding problem but I have tried using Iconv with no success.
I know this is an odd way to do things, and I would ideally not do things like this, but I have no choice.
First of all am I correct in assuming that character encodings should be in UTF-8? If not what should they be in?
A site giving the details of how the different character encodings work would be a start also if someone knows of one. I can at last resort write something to change the encodings myself.
Neil Jones Neil@nwjones.demon.co.uk
On Friday, 4th August 2006 at 09:54:23 (GMT +0100), neil@nwjones.demon.co.uk wrote:
The processing involves inserting text directly into the middle of the wikimedia dumps
It's important what text editor you use to insert that text. Some text editors can only display UTF-8 encoded texts but are unable to save them properly.
I regularly use the method you decribe to manually edit and re-upload phpBB SQL backups (produced by phpBB's own built-in backup facility), and everything works fine, including all accented characters and Russian, Arabic or Chinese sentences. (And this even though phpBB's default distribution encoding is iso-8859-1 and we had to convert all configuration and language files into UTF-8 manually.)
See my email attached, I run into similar issue. It's likely that your character set is latin1 and that you need to use mysqldump using --default-character-set=latin1 options when creating the SQL dump. I also assumed that my charset is UTF-8 just to discover that it's not true.
-----Original Message----- From: mediawiki-l-bounces@Wikimedia.org [mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of neil@nwjones.demon.co.uk Sent: Friday, August 04, 2006 1:54 AM To: mediawiki-l@Wikimedia.org Subject: [Mediawiki-l] Character encoding problems
I have a problem that relates to character encodings. Certain things relating to this problem are beyond my control so I need to know how to change the end situation not how to do it better from the start.
I have a situation where I work on a linux system on my own desktop loading a wiki onto an online system that runs linux.
The processing involves inserting text directly into the middle of the wikimedia dumps and then loading the dumps using ImportDump.php. This works fine except for Accented characters in foreign words, ( and there are rather a lot of these in total). These appear as gibberish. The insertion method is beyond my personal control, so I am stuck with it. It is obviously a character encoding problem but I have tried using Iconv with no success.
I know this is an odd way to do things, and I would ideally not do things like this, but I have no choice.
First of all am I correct in assuming that character encodings should be in UTF-8? If not what should they be in?
A site giving the details of how the different character encodings work would be a start also if someone knows of one. I can at last resort write something to change the encodings myself.
Neil Jones Neil@nwjones.demon.co.uk
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
webmaster@comunitatea-romana.com wrote:
See my email attached, I run into similar issue. It's likely that your character set is latin1 and that you need to use mysqldump using --default-character-set=latin1 options when creating the SQL dump. I also assumed that my charset is UTF-8 just to discover that it's not true.
I am sorry but I cannot see any attachment that contains any further information. I am temporarily forced to work through a webmail interface that may be causing problems.
Neil Jones Neil@nwjones.demon.co.uk
On Friday, 4th August 2006 at 02:11:57 (GMT -0700), Webmaster Comunitatea Româna wrote:
See my email attached, I run into similar issue. It's likely that your character set is latin1 and that you need to use mysqldump using --default-character-set=latin1 options when creating the SQL dump. I also assumed that my charset is UTF-8 just to discover that it's not true.
It's certainly weird. I'm on Windows using the very fine EditPlus text editor (makes converting files from/to UTF-8 a snap), and when I load a WikiMedia SQL dump into EditPlus, it shows this:
http://avenarius.sk/misc/sql-dump.gif
Note the file claims to be encoded in UTF-8, but the characters evidently are not. See several occurrences of "u8c2b4" that apparently represent the curly apostrophe in ISO-8859-1 encoded texts.
In contrast, when loading an SQL dump produced by phpBB's built-in backup manager in EditPlus, the curly apostrophes would dispay as curly apostrophes, all accented letters would be WYSIWYG, and so on. Even though the status line would show the same "Unix,U8" flag as it does for MediaWiki dumps.
(PS: The email attached to your message must have been stripped by Mailman.)
One of the things I have found useful for small amounts of funny-char data is copying directly from one table in one db directly to another table in another db using SQLyog (Table->copy table to different host/db), even when one db is locally and the target is remote.
Hugh http://chainki.org = dmoz + wiki
neil@nwjones.demon.co.uk wrote in message news:E1G8vRn-0003cq-0H@pr-webmail-2.demon.net...
I have a problem that relates to character encodings. Certain things relating to this problem are beyond my control so I need to know how to change the end situation not how to do it better from the start.
Dear all
I just successfully installed MediaWiki on my server; there were no information that the installation was not correct. However, unfortunately, when going to the page /index.php I see the content of the file LocalSettings.php :-( What went wrong? Any help on this is appreciated.
Regards, Albert
mediawiki-l@lists.wikimedia.org