[Mediawiki-l] Database import / export breaks page titles with non-ascii characters, but not page contents

Sulka Haro sulka at sulka.net
Mon Nov 22 22:29:17 UTC 2010


I'm trying to move a MediaWiki 1.16 instance from one hosting provider to
other, and the move is breaking all page titles that have non-us ascii
characters in them, in what looks like an UTF decoding problem. Strangely
the page contents are fine after import, and I can create new pages that
have scandinavian characters on the new provider without any issues, so I'm
slightly baffled as to where the problem is.

I'm doing the move using mysqldump / import, since the MediaWiki's
importDump.php barfs on the new server when trying to import an XML dump
("XML-tuonti epäonnistui jäsennysvirheen takia. rivillä 1, sarakkeessa 1
(tavu 3; "<mediawiki"): Empty document" - gotta love localized error
messages).

I tried reading the SQL and it looks to me like the database contents are
encoded by MediaWiki to some "safe" format, so the problem should not be
with Mysql?

I can't find anything related after reading all the documentation I could
find. The encoding issues that I can find refer to global encoding problems,
not just with page titles.

Any pointers where to look?

Other system details: apache2, ubuntu, PHP 5.2.3.

Thanks,

sulka


More information about the MediaWiki-l mailing list