Hi Jeremy,
Thanks - A list of pages that need fixing is not a problem - it's pretty much a
one-man wiki at the moment, so most of the content should need to be converted.
To add a bit of confusion to the issue, however, I've noticed that the system messages
are also encoded as ISO-8859-1 and thus displaying badly in UTF-8. They haven't even
been customized through the wiki, and I've tried cleaning the l10n_cache table.
I'm not sure where it's getting non-UTF8 versions from. Any ideas how do I go
about fixing that? When I switch the page encoding to ISO-8859-1 the text displays
correctly...
Thanks
Andru
On 12/11/2013, at 13:00, mediawiki-l-request(a)lists.wikimedia.org wrote:
From: Andru Vallance <andru(a)tinymighty.com>
Subject: [MediaWiki-l] Character set problem
Date: 11 de noviembre de 2013 17:17:07 GMT+01:00
To: "mediawiki-l(a)lists.wikimedia.org" <mediawiki-l(a)lists.wikimedia.org>
Reply-To: MediaWiki announcements and site admin list
<mediawiki-l(a)lists.wikimedia.org>
I'm setting up a new wiki installation and running into some problems with garbage
characters showing up due to mismatched character sets. The wiki in question is here:
http://wikiausland.de/bookshop/Hauptseite
New articles written in are fine and display in UTF-8 as expected, but the owner has
copied over some content, presumably from an old wiki or MS Word, and it seems like
it's in ISO-8859-1 and thus showing a heap of question marks for all the umlauts etc…
does anyone know how I can go about converting a page from ISO-8859-1 to UTF-8 easily
enough?
I've tried setting $wgLegacyEncoding to 'ISO-8859-1' [1] in the hope it might
do the conversion for me on article save, but no joy. Are there any other options?
Any tips would be greatly appreciated!
Andru
[1]
https://www.mediawiki.org/wiki/Manual:$wgLegacyEncoding
From: Jeremy Baron <jeremy(a)tuxmachine.com>
Subject: Re: [MediaWiki-l] Character set problem
Date: 11 de noviembre de 2013 17:38:33 GMT+01:00
To: MediaWiki announcements and site admin list <mediawiki-l(a)lists.wikimedia.org>
Reply-To: MediaWiki announcements and site admin list
<mediawiki-l(a)lists.wikimedia.org>
On Mon, Nov 11, 2013 at 4:17 PM, Andru Vallance <andru(a)tinymighty.com> wrote:
I'm setting up a new wiki installation and
running into some problems with garbage characters showing up due to mismatched character
sets. The wiki in question is here:
http://wikiausland.de/bookshop/Hauptseite
New articles written in are fine and display in UTF-8 as expected, but the owner has
copied over some content, presumably from an old wiki or MS Word, and it seems like
it's in ISO-8859-1 and thus showing a heap of question marks for all the umlauts etc…
does anyone know how I can go about converting a page from ISO-8859-1 to UTF-8 easily
enough?
I've tried setting $wgLegacyEncoding to 'ISO-8859-1' [1] in the hope it might
do the conversion for me on article save, but no joy. Are there any other options?
I guess he copied over into a wiki that was already utf8 and so the
row was marked as being utf8 already when saved.
$wgLegacyEncoding should do nothing if the row is already utf8. You
could fix this with a bot or possibly by changing the flag in the DB
(idk how safe that is...).
But the very first thing you need is a list of pages that need fixing.
Maybe that's just as simple as listing that particular user's
contribs.
-Jeremy