On Saturday 08 May 2004 12:42, Brion Vibber wrote:
Nikola Smolenski wrote:
Since that message, I've noticed some bugs and had no time to fix them, but now I think I can present a beta version. Note I worked against 1.2.0rc3 and not against the CVS; this because if I did something in the wrong way, and a rewrite is needed, I didn't want to adapt to CVS changes at the same time. I hope this will be no problem. I have to say that I have a SF account and know how to work with the CVS, so I can submit the code changes directly to it, if needed. Differences follow:
In general I would highly recommend *against* doing significant development on the stable branch. Forward porting things is sometimes more work than expected -- particularly when there are major changes -- or just doesn't get done and your neat new thing gets lost; if you put it in the development version, you just have to wait and it'll become the stable version. ;)
OK, I'll make future patches against the unstable branch.
I think that this is quite selfexplanatory. If $wgLanguageCode is a string, then there is old-fashioned monolingual interface. If it is an array, then a multilingual interface.
LocalSettings.php: $wgLanguageCode = array("en","de","sr");
Neat! But there are some problems that need to be overcome.
First, it won't interact at all well with $wgUseDatabaseMessages, since we have no way to distinguish which language a defined message is intended to reach. Now, it's possible to just make the two modes mutually exclusive, so you get the hardcoded messages if in non-default language. That's probably ok.
No it won't. I have disabled it, and it's easy to make sure that if one is set, the other one is forcefully disabled:
if(is_array($wgLanguageCode)) { $wgLanguageArray=$wgLanguageCode; $wgUseDatabaseMessages=FALSE;
$wgUseDatabaseMessages is not used prior to this point.
Second, a number of language-specific options will affect how things are parsed and stored. Namespace interpretation will be different; I notice this is partially taken care of in your patch by run-time patching to hardcode the English names, but some languages define additional aliased names which would break under another class, and it would be preferable to always use the content language's names rather than English.
A final solution for this would require that, eventually, a MediaWiki installation in ''any'' language recognises codes of ''every'' language. Eventually, this could be done, but until it is done I think that multilingual installations could use English messages only. As I understand, this is needed for Wikisource, Wikibooks and Wiktionary and staying with English is at least as good as what is already there :)
if(is_array($wgLanguageArray)) eval("$wgNamespaceNames".ucfirst( $wgLanguageCode )."=$wgNamespaceNamesEn;");
should be changed to:
if(is_array($wgLanguageArray)) { eval("$wgNamespaceNames".ucfirst( $wgLanguageCode )."=$wgNamespaceNamesEn;"); eval("$wgMagicWords".ucfirst( $wgLanguageCode )."=$wgMagicWordsEn;"); eval("$wgAllMessages".ucfirst( $wgLanguageCode )."['linktrail']=$wgAllMessagesEn['linktrail'];"); eval("$wgAllMessages".ucfirst( $wgLanguageCode )."['uploadlog']=$wgAllMessagesEn['uploadlog'];"); eval("$wgAllMessages".ucfirst( $wgLanguageCode )."['uploadlogpage']=$wgAllMessagesEn['uploadlogpage'];"); eval("$wgAllMessages".ucfirst( $wgLanguageCode )."['deletionlog']=$wgAllMessagesEn['deletionlog'];"); }
That linktrail bit might be a bit of a problem. But even this way, again, it is at least as good as what is there already. Eventually, linktrail should be specific to a code page, not a language. Whether it would be reasonable to do so for UTF-8 I am not sure.
I think that these are all options which affect affect how things are parsed and stored. If I have missed something, tell. I was thinking about also forcing following and other similar options to English:
"mainpage" => "Main Page", "aboutpage" => "$wgMetaNamespace:About", "helppage" => "$wgMetaNamespace:Help",
But as a multilingual site would need to have them in all its languages, I proclaim this a feature, not a bug! :)
Some languages are still using Latin-1 charset, and you really can't mix Latin-1 with UTF-8. Besides the encoding of the messages themselves, the fulltext search index is treated very differently between latin-1 and UTF-8, and some languages such as Chinese and Japanese do some different work to insert simulated word spaces. At some point (hopefully for 1.4) we'll want to add language-specific sorting for displays of titles/usernames etc, which will require generating and storing indexes which depend on the content language.
Yes. One can use all languages in same encoding (all Latin-1, all Latin-2, all UTF-8...) but can not mix encodings. It is trivial to convert any language to UTF-8, except for the linktrail which is not used anyway. Wikisource, Wikibooks and Wiktionary are in UTF-8 already, so I don't think it will be a problem for them. I don't think that language-specific sorting will be a problem when introduced; an user will simply see text sorted in his language. But good luck to one who is going to implement it in UTF-8 for all languages! As for Chinese and Japanese, you were referring to stripForSearch? I don't think that it is a problem, Chinese and Japanese users will be able to search properly, other users will not, but they are not now anyway. But take a look at this:
# Italic is not appropriate for Japanese script # Unfortunately most browsers do not recognise this, and render <em> as italic function emphasize( $text ) { return $text; }
It could make some problems. Eventually, things like this should be specific to the language of a page or even a section of a page. Currently, if Japanese Wikipedia has some emphasized text in English, or if English Wikipedia has some emphasized text in Japanese, things don't work properly. Perhaps emphasize function in Language.php should be remade to not emphasize Japanese and Chinese characters. I don't want to even imagine how that could be done.
Most of the other stuff could be dealt with by defining a 'master language' which controls the content encoding, namespace definitions, logpage names, logpage content, material used for {{transposed}} and substituted messages in content, etc, and a 'display language' which can be selected by the user which will determine the language used for user interface messages. This probably requires more work on the language classes and messages to separate out links.
Yes. Well, currently, English acts as the master language. In future, it should be possible to have some other language as a master language.
Also I'm not keen on changing the value and type of $wgLanguageCode. It would be best I think to keep things predictable and separate the array of selectable languages from the master/content language.
Not a problem. I agree, in light of being able to use another master language in future (that language would then be defined in $wgLanguageCode). While $wgLanguageCodes might be the most intuitive name, it could be easily overlooked, so how about $wgLanguageArray?
If all of this is fine, I'm downloading the code from the CVS, and sending the patches.