Hi Jeremy, Thanks - A list of pages that need fixing is not a problem - it's pretty much a one-man wiki at the moment, so most of the content should need to be converted.
To add a bit of confusion to the issue, however, I've noticed that the system messages are also encoded as ISO-8859-1 and thus displaying badly in UTF-8. They haven't even been customized through the wiki, and I've tried cleaning the l10n_cache table. I'm not sure where it's getting non-UTF8 versions from. Any ideas how do I go about fixing that? When I switch the page encoding to ISO-8859-1 the text displays correctly...
Thanks Andru
On 12/11/2013, at 13:00, mediawiki-l-request@lists.wikimedia.org wrote:
From: Andru Vallance andru@tinymighty.com Subject: [MediaWiki-l] Character set problem Date: 11 de noviembre de 2013 17:17:07 GMT+01:00 To: "mediawiki-l@lists.wikimedia.org" mediawiki-l@lists.wikimedia.org Reply-To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org
I'm setting up a new wiki installation and running into some problems with garbage characters showing up due to mismatched character sets. The wiki in question is here: http://wikiausland.de/bookshop/Hauptseite
New articles written in are fine and display in UTF-8 as expected, but the owner has copied over some content, presumably from an old wiki or MS Word, and it seems like it's in ISO-8859-1 and thus showing a heap of question marks for all the umlauts etc… does anyone know how I can go about converting a page from ISO-8859-1 to UTF-8 easily enough?
I've tried setting $wgLegacyEncoding to 'ISO-8859-1' [1] in the hope it might do the conversion for me on article save, but no joy. Are there any other options?
Any tips would be greatly appreciated!
Andru
[1] https://www.mediawiki.org/wiki/Manual:$wgLegacyEncoding
From: Jeremy Baron jeremy@tuxmachine.com Subject: Re: [MediaWiki-l] Character set problem Date: 11 de noviembre de 2013 17:38:33 GMT+01:00 To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org Reply-To: MediaWiki announcements and site admin list mediawiki-l@lists.wikimedia.org
On Mon, Nov 11, 2013 at 4:17 PM, Andru Vallance andru@tinymighty.com wrote:
I'm setting up a new wiki installation and running into some problems with garbage characters showing up due to mismatched character sets. The wiki in question is here: http://wikiausland.de/bookshop/Hauptseite
New articles written in are fine and display in UTF-8 as expected, but the owner has copied over some content, presumably from an old wiki or MS Word, and it seems like it's in ISO-8859-1 and thus showing a heap of question marks for all the umlauts etc… does anyone know how I can go about converting a page from ISO-8859-1 to UTF-8 easily enough?
I've tried setting $wgLegacyEncoding to 'ISO-8859-1' [1] in the hope it might do the conversion for me on article save, but no joy. Are there any other options?
I guess he copied over into a wiki that was already utf8 and so the row was marked as being utf8 already when saved.
$wgLegacyEncoding should do nothing if the row is already utf8. You could fix this with a bot or possibly by changing the flag in the DB (idk how safe that is...).
But the very first thing you need is a list of pages that need fixing. Maybe that's just as simple as listing that particular user's contribs.
-Jeremy