Hello.
I have tried for several hours to move my mediawiki to another machine resp. the mediawiki data to another MySQL DB.
server_old: 5.0.32-Debian_7etch1-log Mediawiki 1.6.10 MySQL charset: UTF-8 Unicode (utf8) MySQL connection collation: utf8_general_ci
server_new: MySQL 5.0.38-Ubuntu_0ubuntu1-log rest same as above
When I look at the data directly in MySQL my german special chars look a bit cryptic (like Ãœ for Ü). But this seems to be ok, cause Mediawiki stores everything in UTF8 format in the db not caring what MySQL uses (as I read here somewhere). Also Mediawiki on server_old has no problem with this at all.
But the problems begin, when I get a dump through phpmyadmin export (can't do a mysqldumb on console on server_old, my old provider does not permit this). When I look at the dumped SQL file, everythings still seems to be ok. Same cryptic chars for my german special characters as in the server_old db. But when I import the SQL dump to the new database, the cryptic chars change (for example the Ãœ for Ü is now Ö) and the special chars are now corrupt when using the new Mediawiki.
file -i dump.sql gives me: text/x-c; charset=utf-8
The only thing I can think of, is that by importing the file back to the new db, the content of the dump file is encoded again. And that's why those two cryptic chars, are now four. Ü (1) -> Ãœ (2) -> Ö (4)
Any thoughts?
Best regards, Kai
The only thing I can think of, is that by importing the file back to the new db, the content of the dump file is encoded again. And that's why those two cryptic chars, are now four. Ü (1) -> Ãœ (2) -> Ö (4)
Ups, last one are 5 cryptic chars ... it was late ;-) So, I am unsure if my theory is correct.
Best regards, Kai
Kai Schlamp wrote:
The only thing I can think of, is that by importing the file back to the new db, the content of the dump file is encoded again. And that's why those two cryptic chars, are now four. Ü (1) -> Ãœ (2) -> Ö (4)
Ups, last one are 5 cryptic chars ... it was late ;-) So, I am unsure if my theory is correct.
Yes, your theory is correct. mysqldump will be too intelligent. See http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
Thanks for the help. I think you tend on the "Character set" Section (http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Character_set).
My prolbem is that my old provider does not provide the console program "mysqldump". The only way for me is to export via phpmyadmin. And there I can't use the option "-default-character-set=latin1".
Is there a way to convert the dump after export?
Yes, your theory is correct. mysqldump will be too intelligent. See http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
Thanks for the help. I think you tend on the "Character set" Section (http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Character_set).
My prolbem is that my old provider does not provide the console program "mysqldump". The only way for me is to export via phpmyadmin. And there I can't use the option "-default-character-set=latin1".
Is there a way to convert the dump after export?
Yes, your theory is correct. mysqldump will be too intelligent. See http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
Hm, still having problems. Is there perhaps a way to forbid MySQL the reencoding when importing a SQL dump, so that Ü stays as it is, not encoded to Ö, although it comes from an utf-8 file and goes to a latin1 table? Cause that would solve that tricky problem.
Platonides wrote:
Kai Schlamp wrote:
The only thing I can think of, is that by importing the file back to the new db, the content of the dump file is encoded again. And that's why those two cryptic chars, are now four. Ü (1) -> Ãœ (2) -> Ö (4)
Ups, last one are 5 cryptic chars ... it was late ;-) So, I am unsure if my theory is correct.
Yes, your theory is correct. mysqldump will be too intelligent. See http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 09/09/2007, Kai Schlamp schlamp@gmx.de wrote:
Hm, still having problems. Is there perhaps a way to forbid MySQL the reencoding when importing a SQL dump, so that Ãœ stays as it is, not encoded to Ãâ€", although it comes from an utf-8 file and goes to a latin1 table? Cause that would solve that tricky problem.
From your first post, I understood that both MySQL belong to you, right? I mean, you have access to them in whatever way you want. If that's the case, try to install MySQL Administrator (now is an element in the GUI Tools package). MySQL Adminitrator has the backup function which has been working every well to me. The backup file is a SQL file within which I could see that charset for every table is specified clearly (in UTF-8) for my case.
Restoring back is also very simple. Try it.
Platonides wrote:
Kai Schlamp wrote:
The only thing I can think of, is that by importing the file back to
the
new db, the content of the dump file is encoded again. And that's why those two cryptic chars, are now four. Ü (1) -> Ãœ (2) -> Ãâ€" (4)
Ups, last one are 5 cryptic chars ... it was late ;-) So, I am unsure if my theory is correct.
Yes, your theory is correct. mysqldump will be too intelligent. See http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
From your first post, I understood that both MySQL belong to you, right?
No, I just have full access to the server I want to restore mediawiki to. The server to back up the database has only phpmyadmin for exporting. That's the main problem. I solved it by telling my provider to make a mysqldump (with that latin1 option) by hand (and they luckily do so). But I think, that to this problem should be referred on an offical "How to backup" page at the mediawiki documentation. Cause surely a lot of people do backup of mediawiki by exporting via phpmyadmin and are not aware of, that if they had to restore the database, there is trouble preprogrammed.
On 11/09/2007, Kai Schlamp schlamp@medihack.org wrote:
From your first post, I understood that both MySQL belong to you, right?
No, I just have full access to the server I want to restore mediawiki to.
Oh, sad to hear so.
The server to back up the database has only phpmyadmin for exporting.
That's the main problem. I solved it by telling my provider to make a mysqldump (with that latin1 option) by hand (and they luckily do so).
Glad to know that you've solved your problem but certainly a lot of users aren't as lucky as you are. Personally, I don't use phpmyadmin, but since it's possible to evoke mysqldump from within web server using php, I think it's possible to make our own web page to execute mysqldump (using PHP). It would be nice to teach users to do so.
But I think, that to this problem should be referred on an offical "How
to backup" page at the mediawiki documentation. Cause surely a lot of people do backup of mediawiki by exporting via phpmyadmin and are not aware of, that if they had to restore the database, there is trouble preprogrammed.
Hm, still having problems. Is there perhaps a way to forbid MySQL the reencoding when importing a SQL dump, so that Ü stays as it is, not encoded to Ö, although it comes from an utf-8 file and goes to a latin1 table? Cause that would solve that tricky problem.
On 09/09/2007, Kai Schlamp kschlamp@roborg.com wrote:
Hm, still having problems. Is there perhaps a way to forbid MySQL the reencoding when importing a SQL dump, so that Ãœ stays as it is, not encoded to Ãâ€", although it comes from an utf-8 file and goes to a latin1 table? Cause that would solve that tricky problem.
You normally need to instruct mysqldump not to bother messing about with the character set during export; see http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Character_set.
You may also be able to run the dump file through iconv or similar if it's been corrupted.
Rob Church
mediawiki-l@lists.wikimedia.org