Hi,
I am upgrading my non-profit's website which currently runs on MediaWiki v1.16.5 to 2.25.3.
The current site is http://www.hindupedia.com
The new site will be http://www.hindupedia.org
When I installed the new version of mediawiki (manual tar unzipped into the old mediawiki directory) and ran the update.php. After commenting out the incompatible skins/extensions, I got the site up and running.
However, articles with multi-byte titles didn't transfer over or were corrupted. For example,
The current site has a list of articles many of which have multi-byte characters (e.g, with diacritical marks) http://www.hindupedia.com/en/Category:Concise_Encyclopedia_of_Hinduism
The same page in the new site shows the article names are corrupted: http://www.hindupedia.org/en/Category:Concise_Encyclopedia_of_Hinduism
An example is: Abhiniveśa http://www.hindupedia.com/en/Abhinive%C5%9Ba became AbhiniveÅ›a http://www.hindupedia.org/en/Abhinive%C3%85%E2%80%BAa
Multi-byte characters within articles themselves seem to be ok.
Does anyone know what could be causing this and/or how to fix it?
Best Regards,
Krishna -------------------------------------------------------------------------------- Krishna Maheshwari Hindupedia, the Hindu Encyclopedia (www.hindupedia.com) --------------------------------------------------------------------------------
I think something went wrong when you exported and imported the database. (I assume that's how you got the two sites up at the same time?) MediaWiki always stores all text in the UTF-8 encoding, and in the database, the encoding of the fields is marker as either "binary" or "utf8" (the former is recommended due to problems with older MySQL versions).
My guess is that in the `page` table, the `page_title` field was marked as being in some different encoding. This does not normally cause problems (MediaWiki doesn't use any of MySQL encoding conversion functions), but when you exported it, MySQL tried to interpret the UTF-8 bytes as whatever-encoding bytes and converted that to UTF-8. When you imported, the data was already damaged.
If that's indeed the case (I'm just guessing), change the encoding of the field in the original table (do not convert or re-encode the data, it's correct, just marked wrong) and try to re-export and re-import that table.
mediawiki-l@lists.wikimedia.org