On Saturday 08 May 2004 12:42, Brion Vibber wrote:
Nikola Smolenski wrote:
Since that message, I've noticed some bugs
and had no time to fix them,
but now I think I can present a beta version. Note I worked against
1.2.0rc3 and not against the CVS; this because if I did something in the
wrong way, and a rewrite is needed, I didn't want to adapt to CVS changes
at the same time. I hope this will be no problem. I have to say that I
have a SF account and know how to work with the CVS, so I can submit the
code changes directly to it, if needed. Differences follow:
In general I would highly recommend *against* doing significant
development on the stable branch. Forward porting things is sometimes
more work than expected -- particularly when there are major changes --
or just doesn't get done and your neat new thing gets lost; if you put
it in the development version, you just have to wait and it'll become
the stable version. ;)
OK, I'll make future patches against the unstable branch.
I think that
this is quite selfexplanatory. If $wgLanguageCode is a
string, then there is old-fashioned monolingual interface. If it is an
array, then a multilingual interface.
LocalSettings.php:
$wgLanguageCode = array("en","de","sr");
Neat! But there are some problems that need to be overcome.
First, it won't interact at all well with $wgUseDatabaseMessages, since
we have no way to distinguish which language a defined message is
intended to reach. Now, it's possible to just make the two modes
mutually exclusive, so you get the hardcoded messages if in non-default
language. That's probably ok.
No it won't. I have disabled it, and it's easy to make sure that if one is
set, the other one is forcefully disabled:
if(is_array($wgLanguageCode)) {
$wgLanguageArray=$wgLanguageCode;
$wgUseDatabaseMessages=FALSE;
$wgUseDatabaseMessages is not used prior to this point.
Second, a number of language-specific options will
affect how things are
parsed and stored. Namespace interpretation will be different; I notice
this is partially taken care of in your patch by run-time patching to
hardcode the English names, but some languages define additional aliased
names which would break under another class, and it would be preferable
to always use the content language's names rather than English.
A final solution for this would require that, eventually, a MediaWiki
installation in ''any'' language recognises codes of
''every'' language.
Eventually, this could be done, but until it is done I think that
multilingual installations could use English messages only. As I understand,
this is needed for Wikisource, Wikibooks and Wiktionary and staying with
English is at least as good as what is already there :)
if(is_array($wgLanguageArray)) eval("\$wgNamespaceNames".ucfirst(
$wgLanguageCode )."=\$wgNamespaceNamesEn;");
should be changed to:
if(is_array($wgLanguageArray)) {
eval("\$wgNamespaceNames".ucfirst( $wgLanguageCode
)."=\$wgNamespaceNamesEn;");
eval("\$wgMagicWords".ucfirst( $wgLanguageCode
)."=\$wgMagicWordsEn;");
eval("\$wgAllMessages".ucfirst( $wgLanguageCode
)."['linktrail']=\$wgAllMessagesEn['linktrail'];");
eval("\$wgAllMessages".ucfirst( $wgLanguageCode
)."['uploadlog']=\$wgAllMessagesEn['uploadlog'];");
eval("\$wgAllMessages".ucfirst( $wgLanguageCode
)."['uploadlogpage']=\$wgAllMessagesEn['uploadlogpage'];");
eval("\$wgAllMessages".ucfirst( $wgLanguageCode
)."['deletionlog']=\$wgAllMessagesEn['deletionlog'];");
}
That linktrail bit might be a bit of a problem. But even this way, again, it
is at least as good as what is there already. Eventually, linktrail should be
specific to a code page, not a language. Whether it would be reasonable to do
so for UTF-8 I am not sure.
I think that these are all options which affect affect how things are
parsed and stored. If I have missed something, tell. I was thinking about also
forcing following and other similar options to English:
"mainpage" => "Main Page",
"aboutpage" => "$wgMetaNamespace:About",
"helppage" => "$wgMetaNamespace:Help",
But as a multilingual site would need to have them in all its languages, I
proclaim this a feature, not a bug! :)
Some languages are still using Latin-1 charset, and
you really can't mix
Latin-1 with UTF-8. Besides the encoding of the messages themselves, the
fulltext search index is treated very differently between latin-1 and
UTF-8, and some languages such as Chinese and Japanese do some different
work to insert simulated word spaces. At some point (hopefully for 1.4)
we'll want to add language-specific sorting for displays of
titles/usernames etc, which will require generating and storing indexes
which depend on the content language.
Yes. One can use all languages in same encoding (all Latin-1, all Latin-2,
all UTF-8...) but can not mix encodings. It is trivial to convert any language
to UTF-8, except for the linktrail which is not used anyway. Wikisource,
Wikibooks and Wiktionary are in UTF-8 already, so I don't think it will be a
problem for them. I don't think that language-specific sorting will be a
problem when introduced; an user will simply see text sorted in his language.
But good luck to one who is going to implement it in UTF-8 for all languages!
As for Chinese and Japanese, you were referring to stripForSearch? I don't
think that it is a problem, Chinese and Japanese users will be able to search
properly, other users will not, but they are not now anyway. But take a look
at this:
# Italic is not appropriate for Japanese script
# Unfortunately most browsers do not recognise this, and render <em>
as italic
function emphasize( $text )
{
return $text;
}
It could make some problems. Eventually, things like this should be specific
to the language of a page or even a section of a page. Currently, if Japanese
Wikipedia has some emphasized text in English, or if English Wikipedia has
some emphasized text in Japanese, things don't work properly. Perhaps
emphasize function in Language.php should be remade to not emphasize Japanese
and Chinese characters. I don't want to even imagine how that could be done.
Most of the other stuff could be dealt with by
defining a 'master
language' which controls the content encoding, namespace definitions,
logpage names, logpage content, material used for {{transposed}} and
substituted messages in content, etc, and a 'display language' which can
be selected by the user which will determine the language used for user
interface messages. This probably requires more work on the language
classes and messages to separate out links.
Yes. Well, currently, English acts as the master language. In future, it
should be possible to have some other language as a master language.
Also I'm not keen on changing the value and type
of $wgLanguageCode. It
would be best I think to keep things predictable and separate the array
of selectable languages from the master/content language.
Not a problem. I agree, in light of being able to use another master language
in future (that language would then be defined in $wgLanguageCode). While
$wgLanguageCodes might be the most intuitive name, it could be easily
overlooked, so how about $wgLanguageArray?
If all of this is fine, I'm downloading the code from the CVS, and sending the
patches.