Re: [Wikitech-l] Multilingual interface

11 May 2004

On Saturday 08 May 2004 12:42, Brion Vibber wrote:
...
  Nikola Smolenski wrote:
  Since that message, I've noticed some bugs
and had no time to fix them,
 but now I think I can present a beta version. Note I worked against
 1.2.0rc3 and not against the CVS; this because if I did something in the
 wrong way, and a rewrite is needed, I didn't want to adapt to CVS changes
 at the same time. I hope this will be no problem. I have to say that I
 have a SF account and know how to work with the CVS, so I can submit the
 code changes directly to it, if needed. Differences follow: 
 In general I would highly recommend *against* doing significant
 development on the stable branch. Forward porting things is sometimes
 more work than expected -- particularly when there are major changes --
 or just doesn't get done and your neat new thing gets lost; if you put
 it in the development version, you just have to wait and it'll become
 the stable version. ;) 
OK, I'll make future patches against the unstable branch.

...
   I think that
this is quite selfexplanatory. If $wgLanguageCode is a
 string, then there is old-fashioned monolingual interface. If it is an
 array, then a multilingual interface.

 LocalSettings.php:
 $wgLanguageCode = array("en","de","sr"); 
 Neat! But there are some problems that need to be overcome.

 First, it won't interact at all well with $wgUseDatabaseMessages, since
 we have no way to distinguish which language a defined message is
 intended to reach. Now, it's possible to just make the two modes
 mutually exclusive, so you get the hardcoded messages if in non-default
 language. That's probably ok. 
No it won't. I have disabled it, and it's easy to make sure that if one is 
set, the other one is forcefully disabled:

if(is_array($wgLanguageCode)) {
        $wgLanguageArray=$wgLanguageCode;
	$wgUseDatabaseMessages=FALSE;

$wgUseDatabaseMessages is not used prior to this point.

...
  Second, a number of language-specific options will
affect how things are
 parsed and stored. Namespace interpretation will be different; I notice
 this is partially taken care of in your patch by run-time patching to
 hardcode the English names, but some languages define additional aliased
 names which would break under another class, and it would be preferable
 to always use the content language's names rather than English. 
A final solution for this would require that, eventually, a MediaWiki 
installation in ''any'' language recognises codes of
''every'' language. 
Eventually, this could be done, but until it is done I think that 
multilingual installations could use English messages only. As I understand, 
this is needed for Wikisource, Wikibooks and Wiktionary and staying with 
English is at least as good as what is already there :)

if(is_array($wgLanguageArray)) eval("\$wgNamespaceNames".ucfirst( 
$wgLanguageCode )."=\$wgNamespaceNamesEn;");

should be changed to:

if(is_array($wgLanguageArray)) {
	eval("\$wgNamespaceNames".ucfirst( $wgLanguageCode 
)."=\$wgNamespaceNamesEn;");
	eval("\$wgMagicWords".ucfirst( $wgLanguageCode
)."=\$wgMagicWordsEn;");
	eval("\$wgAllMessages".ucfirst( $wgLanguageCode 
)."['linktrail']=\$wgAllMessagesEn['linktrail'];");
	eval("\$wgAllMessages".ucfirst( $wgLanguageCode 
)."['uploadlog']=\$wgAllMessagesEn['uploadlog'];");
	eval("\$wgAllMessages".ucfirst( $wgLanguageCode 
)."['uploadlogpage']=\$wgAllMessagesEn['uploadlogpage'];");
	eval("\$wgAllMessages".ucfirst( $wgLanguageCode 
)."['deletionlog']=\$wgAllMessagesEn['deletionlog'];");
}

That linktrail bit might be a bit of a problem. But even this way, again, it 
is at least as good as what is there already. Eventually, linktrail should be 
specific to a code page, not a language. Whether it would be reasonable to do 
so for UTF-8 I am not sure.

I think that these are all options which affect affect how things are
parsed and stored. If I have missed something, tell. I was thinking about also 
forcing following and other similar options to English:

"mainpage"              => "Main Page",
"aboutpage"             => "$wgMetaNamespace:About",
"helppage"              => "$wgMetaNamespace:Help",

But as a multilingual site would need to have them in all its languages, I 
proclaim this a feature, not a bug! :)

...
  Some languages are still using Latin-1 charset, and
you really can't mix
 Latin-1 with UTF-8. Besides the encoding of the messages themselves, the
 fulltext search index is treated very differently between latin-1 and
 UTF-8, and some languages such as Chinese and Japanese do some different
 work to insert simulated word spaces. At some point (hopefully for 1.4)
 we'll want to add language-specific sorting for displays of
 titles/usernames etc, which will require generating and storing indexes
 which depend on the content language. 
Yes. One can use all languages in same encoding (all Latin-1, all Latin-2, 
all UTF-8...) but can not mix encodings. It is trivial to convert any language 
to UTF-8, except for the linktrail which is not used anyway. Wikisource, 
Wikibooks and Wiktionary are in UTF-8 already, so I don't think it will be a 
problem for them. I don't think that language-specific sorting will be a 
problem when introduced; an user will simply see text sorted in his language. 
But good luck to one who is going to implement it in UTF-8 for all languages! 
As for Chinese and Japanese, you were referring to stripForSearch? I don't 
think that it is a problem, Chinese and Japanese users will be able to search 
properly, other users will not, but they are not now anyway. But take a look 
at this:

        # Italic is not appropriate for Japanese script
        # Unfortunately most browsers do not recognise this, and render <em> 
as italic
        function emphasize( $text )
        {
                return $text;
        }

It could make some problems. Eventually, things like this should be specific 
to the language of a page or even a section of a page. Currently, if Japanese 
Wikipedia has some emphasized text in English, or if English Wikipedia has 
some emphasized text in Japanese, things don't work properly. Perhaps 
emphasize function in Language.php should be remade to not emphasize Japanese 
and Chinese characters. I don't want to even imagine how that could be done.

...
  Most of the other stuff could be dealt with by
defining a 'master
 language' which controls the content encoding, namespace definitions,
 logpage names, logpage content, material used for {{transposed}} and
 substituted messages in content, etc, and a 'display language' which can
 be selected by the user which will determine the language used for user
 interface messages. This probably requires more work on the language
 classes and messages to separate out links. 
Yes. Well, currently, English acts as the master language. In future, it 
should be possible to have some other language as a master language.

...
  Also I'm not keen on changing the value and type
of $wgLanguageCode. It
 would be best I think to keep things predictable and separate the array
 of selectable languages from the master/content language. 
Not a problem. I agree, in light of being able to use another master language 
in future (that language would then be defined in $wgLanguageCode). While 
$wgLanguageCodes might be the most intuitive name, it could be easily 
overlooked, so how about $wgLanguageArray?

If all of this is fine, I'm downloading the code from the CVS, and sending the 
patches.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Multilingual interface