Hi all!
Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be about my proposal to use Parser::getTargetLanguage to allow wiki pages to be generated in different languages depending on the user's interface language [2].
I would like to take this opportunity to gather some input beforehand about how we can improve MediaWiki's support for multilingual wikis on the parser level. In particular, I'm interested to learn about the implications my proposal has for the Translate extension, the templates currently used on commons, sites that use automatic transliteration, etc.
Some context: Currently, MediaWiki doesn't really have a concept of multilingual content. But some wikis, like Commons and Wikidata, show page content in the user's language, using a veriety hacks implemented by extensions such as Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual content, and add some limited suppor for this to core. Some bits and pieces already exist, but that don't quite work for what we need.
One issue is that parser functions (and Lua code) have no good way to know what the target language for the current page rendering is. Both ParserOptions and Parser have a getTargetLanguage method, but this is used *only* when displaying system messages in a different language on pages like MediaWiki:Foo/fr.
I propose to change core so it will set the target language in the parser options to the user language on wikis/namespaces/pages marked as multilingual. This would allow parser functions and Lua libraries to generate content in the desired target language.
There is another related method, which I propose to drop, or at least move: Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems to be used by wikis that apply transliteration to page content, but it's a but the semantics ar ea it unclear. I propose to drop this in favor of ParserOptions::getTargetLanguage, since the display language is not a property of the page, but an option defined for the rendering of the page.
Another related issue is anonymous browsing of multi-lingual content. This will either go past the web cache layer (as is currently done on commons), or it's simply not possible (as currently on wikidata). I have put up an RFC for that as well[3], to be discussed at a different time.
[1] https://phabricator.wikimedia.org/E89 [2] https://phabricator.wikimedia.org/T114640 [3] https://phabricator.wikimedia.org/T114662
-- Daniel Kinzler
This in general reminds me of https://phabricator.wikimedia.org/T4085.
Also, if page content can vary based on user language, what to do about bug reports that Special:WhatLinksHere, category listings, file usage data at the bottom of file description pages, and so on don't report a link/template/category/file that only exists on the page when it's viewed in a non-default language? Yeah, we already have that with {{int:}} hacks, but you're talking about making it more of a feature.
On Tue, Nov 10, 2015 at 2:07 PM, Daniel Kinzler daniel@brightbyte.de wrote:
Hi all!
Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be about my proposal to use Parser::getTargetLanguage to allow wiki pages to be generated in different languages depending on the user's interface language [2].
I would like to take this opportunity to gather some input beforehand about how we can improve MediaWiki's support for multilingual wikis on the parser level. In particular, I'm interested to learn about the implications my proposal has for the Translate extension, the templates currently used on commons, sites that use automatic transliteration, etc.
Some context: Currently, MediaWiki doesn't really have a concept of multilingual content. But some wikis, like Commons and Wikidata, show page content in the user's language, using a veriety hacks implemented by extensions such as Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual content, and add some limited suppor for this to core. Some bits and pieces already exist, but that don't quite work for what we need.
One issue is that parser functions (and Lua code) have no good way to know what the target language for the current page rendering is. Both ParserOptions and Parser have a getTargetLanguage method, but this is used *only* when displaying system messages in a different language on pages like MediaWiki:Foo/fr.
I propose to change core so it will set the target language in the parser options to the user language on wikis/namespaces/pages marked as multilingual. This would allow parser functions and Lua libraries to generate content in the desired target language.
There is another related method, which I propose to drop, or at least move: Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems to be used by wikis that apply transliteration to page content, but it's a but the semantics ar ea it unclear. I propose to drop this in favor of ParserOptions::getTargetLanguage, since the display language is not a property of the page, but an option defined for the rendering of the page.
Another related issue is anonymous browsing of multi-lingual content. This will either go past the web cache layer (as is currently done on commons), or it's simply not possible (as currently on wikidata). I have put up an RFC for that as well[3], to be discussed at a different time.
[1] https://phabricator.wikimedia.org/E89 [2] https://phabricator.wikimedia.org/T114640 [3] https://phabricator.wikimedia.org/T114662
-- Daniel Kinzler
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 11/10/15, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
This in general reminds me of https://phabricator.wikimedia.org/T4085.
Also, if page content can vary based on user language, what to do about bug reports that Special:WhatLinksHere, category listings, file usage data at the bottom of file description pages, and so on don't report a link/template/category/file that only exists on the page when it's viewed in a non-default language? Yeah, we already have that with {{int:}} hacks, but you're talking about making it more of a feature.
If I remember correctly, we already parse the page once in the user language, and once in the content language (canonical parser options) in order to prevent this issue.
I think the biggest thing we could do for multi-lingual support, is introduce {{USERLANGUAGE}} magic word (and equivalent for lua) so people stop using int hacks which is a poor user experience, even by wikitext standards. Most arguments against it are about parser cache splitting, which is silly, as people already split the parser cache on a massive level using {{int: hacks on commons, and the table of contents on pretty much every other wiki (As an aside, TOC really shouldn't split parser cache imo, and that's something I'd like to fix at some point, but as it stands, any page with a ToC is split by user language)
The biggest gotcha to look out for imo, is things like number formatting in parser functions. Sometimes users write templates that make assumptions about the number formatting, and it can vary by page language (however, its entirely possible to make proper templates that don't do that). [Sometimes number formatting seems to use content language, sometimes it seems to use functionLang]
As for actual proposal, I'm a fan of being able to associate a language with a specific revision, to override the default wiki language on a per revision basis. I think it might be interesting to be able to set 'mul' as the content language, in order to make the pages always be in the user language, but that's the sort of thing I think needs some testing to discover forgotten about assumptions about language that MediaWiki might make.
-- bawolff
On Tue, Nov 10, 2015 at 2:07 PM, Daniel Kinzler daniel@brightbyte.de wrote:
Hi all!
Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be about my proposal to use Parser::getTargetLanguage to allow wiki pages to be generated in different languages depending on the user's interface language [2].
I would like to take this opportunity to gather some input beforehand about how we can improve MediaWiki's support for multilingual wikis on the parser level. In particular, I'm interested to learn about the implications my proposal has for the Translate extension, the templates currently used on commons, sites that use automatic transliteration, etc.
Some context: Currently, MediaWiki doesn't really have a concept of multilingual content. But some wikis, like Commons and Wikidata, show page content in the user's language, using a veriety hacks implemented by extensions such as Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual content, and add some limited suppor for this to core. Some bits and pieces already exist, but that don't quite work for what we need.
One issue is that parser functions (and Lua code) have no good way to know what the target language for the current page rendering is. Both ParserOptions and Parser have a getTargetLanguage method, but this is used *only* when displaying system messages in a different language on pages like MediaWiki:Foo/fr.
I propose to change core so it will set the target language in the parser options to the user language on wikis/namespaces/pages marked as multilingual. This would allow parser functions and Lua libraries to generate content in the desired target language.
There is another related method, which I propose to drop, or at least move: Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems to be used by wikis that apply transliteration to page content, but it's a but the semantics ar ea it unclear. I propose to drop this in favor of ParserOptions::getTargetLanguage, since the display language is not a property of the page, but an option defined for the rendering of the page.
Another related issue is anonymous browsing of multi-lingual content. This will either go past the web cache layer (as is currently done on commons), or it's simply not possible (as currently on wikidata). I have put up an RFC for that as well[3], to be discussed at a different time.
[1] https://phabricator.wikimedia.org/E89 [2] https://phabricator.wikimedia.org/T114640 [3] https://phabricator.wikimedia.org/T114662
-- Daniel Kinzler
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I believe the title language support is for the LanguageConverter extension. They used to (ab)use the `{{DISPLAYTITLE:title}}` magic word in order to use the proper language variant, something like:
`{{DISPLAYTITLE:-{en-us:Color; en-gb:Colour}-}}`
Then support was added to avoid the need for this hack, and just Do The Right Thing. I don't know the details, but presumably `Title::getDisplayLanguage` is part of it.
On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff bawolff@gmail.com wrote:
contents on pretty much every other wiki (As an aside, TOC really shouldn't split parser cache imo, and that's something I'd like to fix at some point, but as it stands, any page with a ToC is split by user language)
Then you'll be interested in taking a look at https://phabricator.wikimedia.org/T114057 --scott
On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff bawolff@gmail.com wrote:
On 11/10/15, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
Also, if page content can vary based on user language, what to do about
bug
reports that Special:WhatLinksHere, category listings, file usage data at the bottom of file description pages, and so on don't report a link/template/category/file that only exists on the page when it's viewed in a non-default language? Yeah, we already have that with {{int:}}
hacks,
but you're talking about making it more of a feature.
If I remember correctly, we already parse the page once in the user language, and once in the content language (canonical parser options) in order to prevent this issue.
We parse in the content language to avoid T16404 https://phabricator.wikimedia.org/T16404, which is somewhat the opposite.
My concern here is that if varying page content on user language becomes a supported thing, people will probably complain that {{#ifeq:{{USERLANG}}|en|[[Category:Foo]]|[[Category:Bar]]}} (or the equivalent in Lua) on a site with 'en' as the default won't show the page when you look at Category:Bar, even though it probably will show Category:Bar at the bottom of the page in non-English languages.
T16404 https://phabricator.wikimedia.org/T16404 was about the fact that doing the equivalent with {{int:}} hacks used to sometimes put the page in Category:Foo and sometimes in Category:Bar, depending on the language of whoever last edited (or null-edited) the page.
On 11/10/15, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Tue, Nov 10, 2015 at 4:00 PM, Brian Wolff bawolff@gmail.com wrote:
On 11/10/15, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
Also, if page content can vary based on user language, what to do about
bug
reports that Special:WhatLinksHere, category listings, file usage data at the bottom of file description pages, and so on don't report a link/template/category/file that only exists on the page when it's viewed in a non-default language? Yeah, we already have that with {{int:}}
hacks,
but you're talking about making it more of a feature.
If I remember correctly, we already parse the page once in the user language, and once in the content language (canonical parser options) in order to prevent this issue.
We parse in the content language to avoid T16404 https://phabricator.wikimedia.org/T16404, which is somewhat the opposite.
My concern here is that if varying page content on user language becomes a supported thing, people will probably complain that {{#ifeq:{{USERLANG}}|en|[[Category:Foo]]|[[Category:Bar]]}} (or the equivalent in Lua) on a site with 'en' as the default won't show the page when you look at Category:Bar, even though it probably will show Category:Bar at the bottom of the page in non-English languages.
T16404 https://phabricator.wikimedia.org/T16404 was about the fact that doing the equivalent with {{int:}} hacks used to sometimes put the page in Category:Foo and sometimes in Category:Bar, depending on the language of whoever last edited (or null-edited) the page.
Ah. I read your previous email too fast.
Maybe we should have something like:
{{#langswitch: en=foo fr=le foo .. }}
which works like normal #switch, except without dead-branch elimination. (And for bonus points, implements language fallback sanely).
Or maybe an in-core feature {{#langtransclude:foo}}, which works like normal {{foo}}, except it translcudes the language subpage instead (and does smart fallback, and records a transclusion link record for all the 2-3 letter subpages of the template).
Whatever else we do, I'm really not a fan of the syntax that translate extension does. If we implement something in core to make multilingualism easier, I really hope we go with more sane syntax. [And I say that as a person who love's MW's general crazy syntax]
-- -bawolff
On Tue, Nov 10, 2015 at 4:39 PM, Brian Wolff bawolff@gmail.com wrote:
Maybe we should have something like:
{{#langswitch: en=foo fr=le foo .. }}
which works like normal #switch, except without dead-branch elimination. (And for bonus points, implements language fallback sanely).
That might work in itself. But then {{foo|var={{#langswitch:...}}}} would probably still have potential issues, and the same sort of thing in Scribunto however it's implemented there.
Or maybe an in-core feature {{#langtransclude:foo}}, which works like normal {{foo}}, except it translcudes the language subpage instead (and does smart fallback, and records a transclusion link record for all the 2-3 letter subpages of the template).
You'd also have to parse all those 2-3 letter subpages to get their links, categories, subtemplates, and so on.
On 10.11.2015 22:00, Brian Wolff wrote:
... Most arguments against it are about parser cache splitting, which is silly, as people already split the parser cache on a massive level using {{int: hacks on commons, and the table of contents on pretty much every other wiki (As an aside, TOC really shouldn't split parser cache imo, and that's something I'd like to fix at some point, but as it stands, any page with a ToC is split by user language)
See https://phabricator.wikimedia.org/T114057#1798538 on that issue.
I think it might be interesting to be able to set 'mul' as the content language, in order to make the pages always be in the user language, but that's the sort of thing I think needs some testing to discover forgotten about assumptions about language that MediaWiki might make.
'mul' is to be used if the page content is in mixed languages.
We need to use another, different marker code internally, which is replaced by the user language code when the page is rendered.
Purodha
Quick poke: the IRC discussion is coming up on #wikimedia-office in less than two hours, at 22:00 UTC.
-- daniel
Am 10.11.2015 um 20:07 schrieb Daniel Kinzler:
Hi all!
Tomorrow's RFC discussion[1] on IRC (22:00 UTC at #wikimedia-office) will be about my proposal to use Parser::getTargetLanguage to allow wiki pages to be generated in different languages depending on the user's interface language [2].
I would like to take this opportunity to gather some input beforehand about how we can improve MediaWiki's support for multilingual wikis on the parser level. In particular, I'm interested to learn about the implications my proposal has for the Translate extension, the templates currently used on commons, sites that use automatic transliteration, etc.
Some context: Currently, MediaWiki doesn't really have a concept of multilingual content. But some wikis, like Commons and Wikidata, show page content in the user's language, using a veriety hacks implemented by extensions such as Translate and Wikibase. It would be nice to make MediaWiki aware of multilingual content, and add some limited suppor for this to core. Some bits and pieces already exist, but that don't quite work for what we need.
One issue is that parser functions (and Lua code) have no good way to know what the target language for the current page rendering is. Both ParserOptions and Parser have a getTargetLanguage method, but this is used *only* when displaying system messages in a different language on pages like MediaWiki:Foo/fr.
I propose to change core so it will set the target language in the parser options to the user language on wikis/namespaces/pages marked as multilingual. This would allow parser functions and Lua libraries to generate content in the desired target language.
There is another related method, which I propose to drop, or at least move: Title::getDisplayLanguage (resp ContentHandler::getDisplayLanguage). This seems to be used by wikis that apply transliteration to page content, but it's a but the semantics ar ea it unclear. I propose to drop this in favor of ParserOptions::getTargetLanguage, since the display language is not a property of the page, but an option defined for the rendering of the page.
Another related issue is anonymous browsing of multi-lingual content. This will either go past the web cache layer (as is currently done on commons), or it's simply not possible (as currently on wikidata). I have put up an RFC for that as well[3], to be discussed at a different time.
[1] https://phabricator.wikimedia.org/E89 [2] https://phabricator.wikimedia.org/T114640 [3] https://phabricator.wikimedia.org/T114662
-- Daniel Kinzler
wikitech-l@lists.wikimedia.org