Brion Vibber <brion@...> writes:
Shinjiman wrote:
The issue has been mentioned at
http://mail.wikipedia.org/pipermail/wikitech-
l/2006-March/034397.html in wikitecl-l, and also in BugZilla:5790 http://bugzilla.wikimedia.org/show_bug.cgi?id=5790.
Hi all,
As I've mention that pages above, some Wiki sites has the incorrect lang
tags
which neither assiciated with ISO-693 nor IANA language tags. So I've
examined
this issue, and have a patch sibmitted to BugZilla. However the patch I've made cannot be accepted as Brian said that this would break up the caching system.
There was not any such patch for this, and it would be totally unnecessary to make one. Just let us know what the incorrect ones are and we'll fix the configuration.
Are you maybe thinking of the patch for something totally different which
tries
to guess the visitor's language variant and change the lang attribute based
on it?
For Example: For Simple English Wikipedia, it's show the lang tag as "simple" which do not exists neither ISO639 nor IANA language tag tag. For 'Simple English', the lang code should be "en" (English).
And for Traditional Chinese readers reads a Traditional Chinese webpage, it's supposed to read a page using the Traditional Chinese font. However this does not apply to Wikipedia (and various wiki site running MediaWiki). As the Chinese Wikipedia (and various Chinese based wiki sites that running MediaWiki) has been introduced the LanguageConverter class. The lang tag is "zh" (Chinese). It's not the problem for the Simplified Chinese readers while the (major) browsers will using the Simplified Chinese fonts. However, it's having a problem for Traditional Chinese readers to reading the Chinese context using a Simplified Chinese font.
As your opinion mentioned, to use the language variant to determine the language code that the user is using, it's quite impossible to determine the lang tag by languge variant. Since currently many users in Chinese Wikipedia sets to disable the language variant by default because the Chinese words conversion cause much problems currently have. (This is the main point of the issue) Including me, I'am also using a Traditional Chinese (UI language = zh- hk) interface language and _without enable_ any of language variants (Variant = zh). So as the patch I've submitted, it's not to determine the lang tag only by the language variant, but checks with both interface language and $wgContLanguageCode (Global interface language).
So summarising my statement above, I've suggested to adding a new attribute to assign the lang tag correctly, by using arrays, or something like to provides a similar functionally. For example:
Language code | Language tag -------------------------+---------------------------------------------- en | en de | de simple (Simple English) | en zh-cn | zh-cn <= originally supposed to be zh-hans (R1) zh-sg | zh-sg <= originally supposed to be zh-hans (R1) zh-tw | zh-hant zh-hk | zh-hant zh-mo | zh-hant zh-min-nan | zh-hant <= (R2) zh-yue | zh-hant <= (R2)
*Remarks: 1. The tags is used as zh-cn/zh-sg instead of zh-hans for browsers compatibility (likely IE6 will misunderstand the zh-hans lang tag). 2. The tags is used as zh-hant instead of zh-min-nan/zh-yue browsers compatibility (likely both IE/Firefox will misunderstand both zh-min-nan and zh-yue lang tag).
For that table about is about to construct a lang tag mapping against various languages.
And after this kind of language mapping is done, it's need to modify the OutputPage.php (for older skins) and Monobook.php (for newer skins) to output the <html xml:lang"XXX" lang="XXX"> correctly ("XXX" is the correct lang code instead using the $wgContLanguageCode directly) to address this issue.
Hope my information I've proveded would help you to ongoing and addressing this kind of issue more smoothly. :)
regards Shinjiman