Hi all,
We have hired an external service to update our system from 1.11 to 1.14. After many delays (which explains why 1.14 and not 1.15), now it's a mess because at many places accented characters looks as if they were unencoded UTF-8 characters (ie, ó is not an unencoded ó, but the two UTF-8 encoded chars à and ³). Examples are:
http://www.wikilengua.org/index.php/Propiedad:Norma_UNE_(Terminesp) http://www.wikilengua.org/index.php/Special:UnusedImages
Mainly in order to complain, any idea of why this mess? Is there a way fix it?
(Semantic MW has stopped working properly, too :-().
Thanx Javier Bezos -------------------------------------- http://www.wikilengua.org/
Javier Bezos escribió:
Hi all,
We have hired an external service to update our system from 1.11 to 1.14. After many delays (which explains why 1.14 and not 1.15), now it's a mess because at many places accented characters looks as if they were unencoded UTF-8 characters (ie, ó is not an unencoded ó, but the two UTF-8 encoded chars à and ³). Examples are:
http://www.wikilengua.org/index.php/Propiedad:Norma_UNE_(Terminesp) http://www.wikilengua.org/index.php/Special:UnusedImages
Mainly in order to complain, any idea of why this mess? Is there a way fix it?
(Semantic MW has stopped working properly, too :-().
Thanx Javier Bezos
Hola Javier, MediaWiki guarda *siempre* el texto en UTF-8. Pero en ocasiones mysql se cree que el texto está en otro formato. Esto no es ningún problema para MediaWiki, que sabe cómo tiene que tratarlos, pero puede dar problemas si se editar directamente la tabla o al hacer copias de seguridad con mysqldump. Por lo que comentas, parece que han actualizado el formato en la base de datos a UTF-8 pero sin avisar a MediaWiki. Prueba a añadir $wgDBmysql5 = false; a LocalSettings.php Si cambiar el valor de $wgDBmysql5 no lo arregla, seguramente tendrás que hacer un volcado, recodificarlo e importarlo de nuevo en mysql.
I think it is a good idea to keep posts to this list in English. Since the overwhelming majority of posts are in English, I guess that all subscribers can understand and manage speaking a little of English, but the same is not true to the many other languages spoken around the world. If anyone posts in other languages, the conversation becomes automatically restricted, people that may know the answer won't be able to help and the chances to have your problem fixed are reduced.
Back on topic:
In my experience, there are many different ways to export and import data from and to MySQL databases, and many, many of them are broken when it comes to binary data or non-ASCII text. Many hosting providers use phpMyAdmin or some variant to export MySQL databases for backup. Do not use that!
The secure way to backup databases and reimport them somewhere else is to use the command-line tools.
To export: mysqldump -uUSER -p DATABASE > FILENAME.sql
To import: mysql -uUSER -p < FILENAME.sql
I also recommend checking that the terminal locale in both systems are compatible with the 'locale' (Linux) or 'env' (other) commands. In the case of doubt, add "LANG=C LC_ALL=C" before each command to force a common locale in both systems.
Also, use md5sum or sha1sum to check that the sql file wasn't damaged during transport. When transferring the file, transfer it as a binary/image and don't let the FTP software (if you are using any) detect that it "looks like text". Gzipping the file before transfer is a good idea to avoid this problem.
Do not try to edit the sql file between export and import, specially if your editor thinks it knows how to handle files with mixed binary/text data. If you still want to edit the sql file, do not touch the /*!...*/ comments near the beginning and the end of the file, those comments tell the importer how character data is to be handled. This is precisely where phpMyAdmin and other similar tools fail to produce usable backups.
Regards, Juliano.
Javier Bezos wrote:
Hi all,
We have hired an external service to update our system from 1.11 to 1.14. After many delays (which explains why 1.14 and not 1.15), now it's a mess because at many places accented characters looks as if they were unencoded UTF-8 characters (ie, ó is not an unencoded ó, but the two UTF-8 encoded chars à and ³). Examples are:
http://www.wikilengua.org/index.php/Propiedad:Norma_UNE_(Terminesp) http://www.wikilengua.org/index.php/Special:UnusedImages
Mainly in order to complain, any idea of why this mess? Is there a way fix it?
(Semantic MW has stopped working properly, too :-().
Thanx Javier Bezos
Juliano F. Ravasi writes:
In my experience, there are many different ways to export and import data from and to MySQL databases, and many, many of them are broken when it comes to binary data or non-ASCII text. Many hosting providers use phpMyAdmin or some variant to export MySQL databases for backup. Do not use that!
The secure way to backup databases and reimport them somewhere else is to use the command-line tools.
To export: mysqldump -uUSER -p DATABASE > FILENAME.sql
To import: mysql -uUSER -p < FILENAME.sql
I also recommend checking that the terminal locale in both systems are compatible with the 'locale' (Linux) or 'env' (other) commands. In the case of doubt, add "LANG=C LC_ALL=C" before each command to force a common locale in both systems.
That's not always secure for MediaWiki. See http://www.mediawiki.org/wiki/Manual:Backup#Character_set
mediawiki-l@lists.wikimedia.org