Hi,
I usually don't post to mailing lists, but Brion suggested I should do this for the page content language.
I suppose most people now that I improved the RTL support. Documentation of that is now at http://www.mediawiki.org/wiki/Directionality_support If it is incomplete or unclear about something, please ask so I can improve the docs.
While doing that, I introduced a "page content language" that defines the language in which a specific page is written. I added docs for that as well, see http://www.mediawiki.org/wiki/Language_in_MediaWiki For special pages it is $wgLang, for MediaWiki namespace pages it depends on the subpage code, for other pages it is $wgContLang. Extensions (like Translate) can change the language a page is supposed to be written in. This affects the direction of the content, the TOC, and (in theory) the grammar. Again, if the docs are missing something important, let me know.
But, now that I am writing this anyway, I have a question: should magic words like CURRENTMONTH and NUMBEROFARTICLES use the page content language rather than wgContLang? It would be more logical (and on Incubator even wanted: http://incubator.wikimedia.org/wiki/Template:Wp/lkt/CURRENTMONTHNAMEI ) but I am not sure if it would break things, e.g. when just with a template.
(And btw, another i18n thing that needs attention is LanguageConverter (even just for missing docs). I am looking if I can help out there.)
Regards, Robin aka SPQRobin
On Mon, Aug 15, 2011 at 1:23 PM, Robin Pepermans robinp.1273@gmail.comwrote:
Hi,
I usually don't post to mailing lists, but Brion suggested I should do this for the page content language.
Thanks! :D
I suppose most people now that I improved the RTL support. Documentation of that is now at http://www.mediawiki.org/wiki/Directionality_support If it is incomplete or unclear about something, please ask so I can improve the docs.
While doing that, I introduced a "page content language" that defines the language in which a specific page is written. I added docs for that as well, see http://www.mediawiki.org/wiki/Language_in_MediaWiki For special pages it is $wgLang, for MediaWiki namespace pages it depends on the subpage code, for other pages it is $wgContLang. Extensions (like Translate) can change the language a page is supposed to be written in. This affects the direction of the content, the TOC, and (in theory) the grammar. Again, if the docs are missing something important, let me know.
I am super happy about this going in as a general concept -- we'll want to make sure there's a way for 'generic' multilingual sites (like meta and commons) to tag pages with their languages as well.
Note for those not reading through the links ;) -- you can get a language from a Title object with $title->getPageLanguage(). There's no actual storage of the value now; it just handles some standard logic (special pages are user lang; MediaWiki pages use the language, etc) and provides a hook that extensions can grab to override info for any particular page.
For Translate and Incubator, language can be easily pulled from the title as language codes get used as a page title component (suffix or prefix or some such). However for more general pages this may end up being better defined as metadata tied to the page, which'll need storage and being kept across export/import, delete/undelete, etc.
It's conceivable that we might want to move that interface over from Title to WikiPage or something, though Title seems to fit best with how we tend to index these things for now (editing protections are also accessed via Title).
But, now that I am writing this anyway, I have a question: should magic words like CURRENTMONTH and NUMBEROFARTICLES use the page content language rather than wgContLang? It would be more logical (and on Incubator even wanted: http://incubator.wikimedia.org/wiki/Template:Wp/lkt/CURRENTMONTHNAMEI ) but I am not sure if it would break things, e.g. when just with a template.
I would tend to expect it to use the page content language; for templates of course you may well have the issue that the template is trying to work in a particular language, say to generate a link, so that may require some pondering. :)
If we pull a template into a parent page, does the template have its own inherent languageness? This is all relevant also to tagging output for languages to aid with screen readers, translation tools, search engines etc -- bug 14649 https://bugzilla.wikimedia.org/show_bug.cgi?id=14649 gives an example of this with messages but templates can have the same issues.
-- brion
(And btw, another i18n thing that needs attention is LanguageConverter (even just for missing docs). I am looking if I can help out there.)
Regards, Robin aka SPQRobin
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2011/8/15 Brion Vibber brion@pobox.com:
I am super happy about this going in as a general concept -- we'll want to make sure there's a way for 'generic' multilingual sites (like meta and commons) to tag pages with their languages as well.
Yeah, I forgot to mention that, when the Translate extension is enabled on Meta (I heard there are plans for that), the page translation feature makes use of the page content language hook to set the right language, which will improve its multilingualism. For other wikis (Commons, MediaWiki.org), we might make a simple extension that just checks the subpage if it is a language code, and sets the language accordingly. (Is easy to do).
For Translate and Incubator, language can be easily pulled from the title as language codes get used as a page title component (suffix or prefix or some such). However for more general pages this may end up being better defined as metadata tied to the page, which'll need storage and being kept across export/import, delete/undelete, etc.
Yeah, the value would probably have to be stored somewhere if we want to be able to implement a magic word or "language selector" to mark the language of a page. Btw, see bug 9360 and 28970 for that.
But, now that I am writing this anyway, I have a question: should magic words like CURRENTMONTH and NUMBEROFARTICLES use the page content language rather than wgContLang? It would be more logical (and on Incubator even wanted: http://incubator.wikimedia.org/wiki/Template:Wp/lkt/CURRENTMONTHNAMEI ) but I am not sure if it would break things, e.g. when just with a template.
I would tend to expect it to use the page content language; for templates of course you may well have the issue that the template is trying to work in a particular language, say to generate a link, so that may require some pondering. :)
If we pull a template into a parent page, does the template have its own inherent languageness? This is all relevant also to tagging output for languages to aid with screen readers, translation tools, search engines etc -- bug 14649 https://bugzilla.wikimedia.org/show_bug.cgi?id=14649 gives an example of this with messages but templates can have the same issues.
I changed magic words to follow page content language on my localhost, and then I tried to include a template which is parsed as Dutch (Template:Wn/nl/Page with Incubator extension) into a page following the site content language. Apparently magic words in the template are parsed according to the site language (i.e. the page where the template is included).
So I will change these time and number-formatting magic words, but I won't change the NAMESPACE(E) magic word as this really depends on the *site* language.
This will also in fact change their output in system messages (and on the respective MediaWiki namespace pages), but afaik they are never used in system messages.
* Brion Vibber wrote:
For Translate and Incubator, language can be easily pulled from the title as language codes get used as a page title component (suffix or prefix or some such). However for more general pages this may end up being better defined as metadata tied to the page, which'll need storage and being kept across export/import, delete/undelete, etc.
Note that the North Frisian Wikipedia has a need to identify the dialect for each article (North Frisian is split into roughly 10 dialects which aren't generally mutually intelligible but it may also be hard to tell which particular dialect something is written in from a short stub) and they currently use {{dialect}} templates at the top of the pages to make this identification. There currently aren't proper language tags for the dialects but http://tools.ietf.org/html/draft-hoehrmann-nordfriisk-00 I am working on fixing that.
(I am not sure what that implies for this debate, but it is something to be aware of in the context of page language meta data features.)
On Mon, Aug 15, 2011 at 1:23 PM, Robin Pepermans robinp.1273@gmail.comwrote:
I suppose most people now that I improved the RTL support. Documentation of that is now at http://www.mediawiki.org/wiki/Directionality_support If it is incomplete or unclear about something, please ask so I can improve the docs.
This reminds me -- just how is all the RTL magic for styles handled?
Page currently says "Thanks to ResourceLoader including CSSJanus, CSS is automatically flipped to right-to-left when the user language is RTL. This is default since 1.18 (in previous versions it was dependent on the wiki content language https://bugzilla.wikimedia.org/show_bug.cgi?id=6100)."
However styles for content should follow the content language, while styles for UI should follow the UI language; are there multiple sets of rules, or all the rules setting stuff appropriately based on whether it's in an .rtl or .ltr section and it all just works out?
-- brion
2011/8/16 Brion Vibber brion@pobox.com:
On Mon, Aug 15, 2011 at 1:23 PM, Robin Pepermans robinp.1273@gmail.comwrote:
I suppose most people now that I improved the RTL support. Documentation of that is now at http://www.mediawiki.org/wiki/Directionality_support If it is incomplete or unclear about something, please ask so I can improve the docs.
This reminds me -- just how is all the RTL magic for styles handled?
Page currently says "Thanks to ResourceLoader including CSSJanus, CSS is automatically flipped to right-to-left when the user language is RTL. This is default since 1.18 (in previous versions it was dependent on the wiki content language https://bugzilla.wikimedia.org/show_bug.cgi?id=6100)."
However styles for content should follow the content language, while styles for UI should follow the UI language; are there multiple sets of rules, or all the rules setting stuff appropriately based on whether it's in an .rtl or .ltr section and it all just works out?
There was no decent solution for this. Most of the style is for UI, and only some stuff is for the content (like editsection, ul/ol). For that, I introduced the mw-content-ltr / mw-content-rtl classes. The external link icons were too difficult to fix it that way, so these still depend on the UI language. For diffs (and similar things), I made PHP add classes (like diff-contentalign-right, diff-contentalign-left) and then made CSS style both classes. Relevant bug: bug 28693
wikitech-l@lists.wikimedia.org