Hello.
I have 2 questions about UTF-8 encoding in wiki.
1. How is the UTF-8 encoding encoded and decoded into other encodings? Where in the sources can I find it? And what additional libraries or software (except php, apache etc.) should I have in order to encode/decode UTF-8? 2. How can I define (in the code) what encoding my Wiki uses? I mean what variable contain information about the encoding?
Thank you.
Best regards, Alexander
Alexander Prudnikov wrote:
Hello.
I have 2 questions about UTF-8 encoding in wiki.
- How is the UTF-8 encoding encoded and decoded into other encodings?
Where in the sources can I find it? And what additional libraries or software (except php, apache etc.) should I have in order to encode/decode UTF-8? 2. How can I define (in the code) what encoding my Wiki uses? I mean what variable contain information about the encoding?
Hello,
1/ It doesnt make sens to decode an UTF-8 encoded text to ASCII for example. Utf-8 offer much more characters that you will not be able to correctly translate, same for ISO-8859-1 (wich doesnt have the oelig; ).
I don't think you need any specific library for php / apache. The only thing needed is to output an http header saying wich encoding is used so the browser correctly decode the text.
2/ The default MediaWiki encoding is set to ISO-8859-1 through the $wgInputEncoding and $wgOutputEncoding of ./includes/DefaultSettings.php
When you configure the language to be used in LocalSettings.php ( example: $wgLanguageCode = "fr"; ), the software will include a language specific script in ./languages. The Fr one in turn load LanguageUtf8.php that set the encoding options.
So basicly: set $wgInputEncoding and $wgOutPutEncoding in your Language file and it should works :o) I recommend using utf-8.
Alexander Prudnikov wrote:
I have 2 questions about UTF-8 encoding in wiki.
- How is the UTF-8 encoding encoded and decoded into other encodings?
Where in the sources can I find it? And what additional libraries or software (except php, apache etc.) should I have in order to encode/decode UTF-8?
Your PHP must have the XML module installed (it is installed by default) which provides utf8_encode and utf8_decode functions. If you have iconv support compiled in this will be used instead, which may be necessary for URL encoding compatibility conversion for non-Western languages.
- How can I define (in the code) what encoding my Wiki uses? I mean what
variable contain information about the encoding?
As of 1.3 the default encoding for all languages is UTF-8.
Latin-1 compatibility mode is enabled by setting $wgUseLatin1 = true; in LocalSettings.php; this will downconvert the UTF-8 text in the language file to Latin-1 as needed and mark the pages with the encoding marker for ISO-8859-1 instead of UTF-8.
-- brion vibber (brion @ pobox.com)
Alexander Prudnikov wrote:
- How is the UTF-8 encoding encoded and decoded into other encodings?
Where in the sources can I find it? And what additional libraries or software (except php, apache etc.) should I have in order to encode/decode UTF-8?
I am not aware that MediaWiki currently does any encoding or decoding at all. All our wikis use either ISO-8859-1 or UTF-8, and each of them uses this one encoding for input, database storage, output and everything. The only conversion we ever need to do is when switching a wiki to a different encoding, but this is a one-time thing per wiki, and if you start with UTF-8 right from the start, you never need to bother with this.
- How can I define (in the code) what encoding my Wiki uses? I mean what
variable contain information about the encoding?
$wgInputEncoding and $wgOutputEncoding. I don't know what happens if you specify different encodings for the two; just use UTF-8 for both of them and you'll be fine.
Timwi
wikitech-l@lists.wikimedia.org