Re: [Wikitech-l] Multilingual interface

8 May 2004

Nikola Smolenski wrote:
...
  Since that message, I've noticed some bugs and had
no time to fix them, but 
 now I think I can present a beta version. Note I worked against 1.2.0rc3 and 
 not against the CVS; this because if I did something in the wrong way, and a 
 rewrite is needed, I didn't want to adapt to CVS changes at the same time. I 
 hope this will be no problem. I have to say that I have a SF account and know 
 how to work with the CVS, so I can submit the code changes directly to it, if 
 needed. Differences follow: 
In general I would highly recommend *against* doing significant 
development on the stable branch. Forward porting things is sometimes 
more work than expected -- particularly when there are major changes -- 
or just doesn't get done and your neat new thing gets lost; if you put 
it in the development version, you just have to wait and it'll become 
the stable version. ;)

...
  I think that this is quite selfexplanatory. If
$wgLanguageCode is a string, 
 then there is old-fashioned monolingual interface. If it is an array, then a 
 multilingual interface.

 LocalSettings.php:
 $wgLanguageCode = array("en","de","sr"); 
Neat! But there are some problems that need to be overcome.

First, it won't interact at all well with $wgUseDatabaseMessages, since 
we have no way to distinguish which language a defined message is 
intended to reach. Now, it's possible to just make the two modes 
mutually exclusive, so you get the hardcoded messages if in non-default 
language. That's probably ok.

Second, a number of language-specific options will affect how things are 
parsed and stored. Namespace interpretation will be different; I notice 
this is partially taken care of in your patch by run-time patching to 
hardcode the English names, but some languages define additional aliased 
names which would break under another class, and it would be preferable 
to always use the content language's names rather than English.

Some languages are still using Latin-1 charset, and you really can't mix 
Latin-1 with UTF-8. Besides the encoding of the messages themselves, the 
fulltext search index is treated very differently between latin-1 and 
UTF-8, and some languages such as Chinese and Japanese do some different 
work to insert simulated word spaces. At some point (hopefully for 1.4) 
we'll want to add language-specific sorting for displays of 
titles/usernames etc, which will require generating and storing indexes 
which depend on the content language.

As for the Latin1/UTF-8 thing, I'd like to just transition everything to 
UTF-8. Hopefully we'll do that soon, as we've been moving tentatively in 
that direction, and it'll end up a non-problem.

Most of the other stuff could be dealt with by defining a 'master 
language' which controls the content encoding, namespace definitions, 
logpage names, logpage content, material used for {{transposed}} and 
substituted messages in content, etc, and a 'display language' which can 
be selected by the user which will determine the language used for user 
interface messages. This probably requires more work on the language 
classes and messages to separate out links.

Also I'm not keen on changing the value and type of $wgLanguageCode. It 
would be best I think to keep things predictable and separate the array 
of selectable languages from the master/content language.

-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Multilingual interface