Re: [Wikitech-l] The lang tag in <HTML> ain't identical to $wgContLanguageCode

16 May 2006


      Steve Summit wrote:
...
Shinjiman and Happy Rabbit, I don't know about everybody else
here, but I'd find this issue easier to think about if I
understood a bit more about what you're trying to accomplish.
Apologies for not knowing as much as I should about these issues.
My main question is this.  Why do we need the extra mapping of
the HTML lang tag?  Why is it not adequate to set the value of
$wgContLanguageCode appropriately for each wiki, so that it
matches the language that the wiki's pages are written in?
The base problem is that various characters may appear differently when rendered
in a Simplified Chinese font vs a Traditional Chinese font.
See http://en.wikipedia.org/wiki/Han_unification for some background on this issue.
Characters that differ semantically between the variants generally are assigned
separate Unicode code points, but for characters that simply have different
customary appearances the difference is left to the fonts.
Browsers will (or at least, may) pick appropriate default fonts based on the
language specified in the web page.
Thus for example, a page which is marked as "zh" (defaulting usually to
Simplified Chinese) but which contains text written for Traditional form may
display oddly.
We have both Traditional and Simplified forms mixed on the Chinese-language
Wikipedia. The wiki software is extended with two key abilities relevant to this:
1) When logged in, a user can select which language the user-interface text
appears in. As well as completely different languages, they can choose a
specific variant of Chinese (eg, Traditional instead of Chinese).
2) A visitor can select to have the text of a page automatically converted to
their preferred variant (Simplified or Traditional) for more comfortable reading.
For a quick sample, here's the main page in default:
http://zh.wikipedia.org/wiki/%E9%A6%96%E9%A1%B5
and here with all text converted to Simplified script forms:
http://zh.wikipedia.org/w/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh-...
and here with all text converted to Traditional script forms:
http://zh.wikipedia.org/w/index.php?title=%E9%A6%96%E9%A1%B5&variant=zh-...
I don't know Chinese well enough to tell what's displaying right and what's
displaying wrong, but depending on your browser and fonts situation, you might
be able to see that some of the characters appear in different fonts.
At the moment no matter which variant you pick, the HTML still declares its
language to be "zh", just plain Chinese. Shinjiman indicates that this defaults
to Simplified Chinese in browsers that pick fonts depending on the declared
language code; thus this might be displaying incorrectly when text is actually
Traditional Chinese.
Now, it would appear to make reasonable sense to pick a more detailed language
code when variant conversion is in use, so that browsers' font selections pick
the matching variant. On this I concur completely.
Where we appear to have got bogged down is in how this relates to two things
a) The user interface language selected in the user's wiki preferences.
b) The Accept-Language header sent from the browser.
As for a), it may make sense for a lot of user-interface text to appear with its
own overriding lang attributes. This isn't necessarily easy in all cases, but if
it's appropriate it's something we can talk about.
As for b), by itself it doesn't seem to make a lot of sense to me; first,
picking a language code from the client would simply cause it to fail to match
with the content language. Second, the more general issue of picking
user-interface language based on the header is something we've provisionally
rejected for a long time; non-default UI languages are often not fully
customized and would often produce a sub-par or confusing user experience.
Additionally supporting it would require changes to the caching infrastructure
and would increase server load by an unknown and potentially large amount.
b) appears to be entirely separate from the question at hand, so I'd prefer to
leave it for another discussion.
-- brion vibber (brion @ pobox.com)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] The lang tag in <HTML> ain't identical to $wgContLanguageCode