Hi folks,
I'd like to start a broader conversation about language support in MW core, and the potential need to re-think some pretty fundamental design decisions in MediaWiki if we want to move past the point of diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a single wiki is mostly handled through JS hacks, templates, and manual markup added to the content (such as <div>s indicating language direction), we are providing an opaque, confusing and often inconsistent user experience in our multilingual wikis, which is a major impediment for growth of non-English content in those wikis, and participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors, and they certainly are (since Commons categories are largely in English, they are by definition excluding folks who don't speak the language), but I'd like to focus on the non-category parts of the problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
1) There's no consistent method by which multiple language editions of the same page are surfaced for selection by the use. Different wikis use different templates (often multiple variants and layouts in a single wiki), different positioning, different rules, etc., leading to inconsistent user experience. Consistency is offered by language headers generated by the Translate extension, but these are used for managing translations, while multilingual content existing in the same wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained, consider the user-friendliness of something like the +/- link in the language header on a page like https://commons.wikimedia.org/wiki/Commons:Kooperationen which leads to: https://commons.wikimedia.org/w/index.php?title=Template:Lang-Partnerships&a...
Chances are that a lot of people who'd have the ability to provide a version (not necessarily a translation) of the page in a given language will give up even on the process of doing so correctly.
2) There's no consistent method by which page name conflicts (which may often occur in similar languages) are resolved, and users have to manually disambiguate.
3) There are basic UX issues in the language selection tools offered today. For example, after changing the language on Commons to German, I will see the page I'm on (say English) with a German user interface, even if there's an actual German content version of the page available. This is because these language selection tools have no awareness of the existence of content in relevant languages.
4) In order to ensure that content is rendered correctly irrespective of the UI language set, we require content authors to manually add <div>s around RTL content, even if that's all the page contains.
5) It's impossible to restrict searches to a specific language. It's impossible to restrict recent changes and similar tools to a specific language.
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
This is a very tricky set of architectural issues to solve well, and it would be easy to make the user experience worse by solving it poorly. Still, as we grow our bench strength to take on hard problems, I want to raise the temperature of this problem a bit again, especially from the standpoint of future platform engineering improvements.
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not suggesting this should be done in the very near term, but I'd like to at least start talking about it, hear if I'm completely off base (and if there are simpler ways to improve on current state), and explore where it could fit in our longer term agenda.
Relevant existing code:
* https://www.mediawiki.org/wiki/Extension:Translate - awesome for page and message translation, but I'm not clear that it can help for the other multilingual content scenarios and problems
* Others: https://www.mediawiki.org/wiki/Category:Internationalization_extensions
Thanks, Erik
-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every month. The world population is estimated to be >7 billion. Still a long way to go. Support us. Join us. Share: https://wikimediafoundation.org/
On 2013-04-24 12:30 AM, "Erik Moeller" erik@wikimedia.org wrote:
[..]
In a nutshell, is it time to make MW aware of multiple content languages in a single wiki?
That would be nice.
- There's no consistent method by which multiple language editions of
the same page are surfaced for selection by the use. Different wikis use different templates (often multiple variants and layouts in a single wiki), different positioning, different rules, etc., leading to inconsistent user experience. Consistency is offered by language headers generated by the Translate extension, but these are used for managing translations, while multilingual content existing in the same wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained, consider the user-friendliness of something like the +/- link in the language header on a page like https://commons.wikimedia.org/wiki/Commons:Kooperationen which leads to:
https://commons.wikimedia.org/w/index.php?title=Template:Lang-Partnerships&a...
Chances are that a lot of people who'd have the ability to provide a version (not necessarily a translation) of the page in a given language will give up even on the process of doing so correctly.
- There's no consistent method by which page name conflicts (which
may often occur in similar languages) are resolved, and users have to manually disambiguate.
- There are basic UX issues in the language selection tools offered
today. For example, after changing the language on Commons to German, I will see the page I'm on (say English) with a German user interface, even if there's an actual German content version of the page available. This is because these language selection tools have no awareness of the existence of content in relevant languages.
This is not entirely true. View an image description page with a license. Change your user language. The license template changes. This is accomplished by one of the worst hacks imaginable but does "work".
Although the way that is accomplished is ugly beyond belief, from the backend prespective, varying a page by user language is not a big deal afaik.
The general selection of different versions of entire pages doesnt work this way as you noted.
Points 4 and 5 could maybe be solved in not that hard a fashion if we had info on what language a page (or revision?) Is in.
[..]
current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
Indeed.
[..]
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I don't think that is that big a problem (other than perhaps dealing with the multilingual case). We already have the page lang support. Seems like mostly putting an interface on top and somewhere to store the info.
-bawolff
On Tue, Apr 23, 2013 at 9:08 PM, Brian Wolff bawolff@gmail.com wrote:
Hi Brian,
We already have the page lang support.
What do you mean by that? AFAICT there's no existing designated place in the schema for associating a content language with a specific page.
Thanks, Erik
On 2013-04-25 7:04 AM, "Erik Moeller" erik@wikimedia.org wrote:
On Tue, Apr 23, 2013 at 9:08 PM, Brian Wolff bawolff@gmail.com wrote:
Hi Brian,
We already have the page lang support.
What do you mean by that? AFAICT there's no existing designated place in the schema for associating a content language with a specific page.
Thanks, Erik
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
There's nothing in the schema. But we do (since 1.19) have a notion of "page language" separate from content language. See the PageContentLanguage hook and related code. So all that needs to be done is add something to the schema to actually store a value.
-bawolff
2013/4/25 Brian Wolff bawolff@gmail.com:
On 2013-04-25 7:04 AM, "Erik Moeller" erik@wikimedia.org wrote:
On Tue, Apr 23, 2013 at 9:08 PM, Brian Wolff bawolff@gmail.com wrote:
Hi Brian,
We already have the page lang support.
What do you mean by that? AFAICT there's no existing designated place in the schema for associating a content language with a specific page.
There's nothing in the schema. But we do (since 1.19) have a notion of "page language" separate from content language. See the PageContentLanguage hook and related code. So all that needs to be done is add something to the schema to actually store a value.
That, and a way for the user to specify that language. Either through a magic word or through a language selector on the editing page (either ULS or a dropdown). Of course, the wiki language should be the default. Is there anything else to it?
That's a as far as a page language goes; It is also useful to specify the language of chunks of a page. Currently it's done with the HTML lang attribute, either raw or through templates. It should probably be done using the VisualEditor, and I already wrote a general spec for it a while ago: https://www.mediawiki.org/wiki/VisualEditor/Internationalization_requirement...
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On 2013-04-25 9:12 AM, "Amir E. Aharoni" amir.aharoni@mail.huji.ac.il wrote:
2013/4/25 Brian Wolff bawolff@gmail.com:
On 2013-04-25 7:04 AM, "Erik Moeller" erik@wikimedia.org wrote:
On Tue, Apr 23, 2013 at 9:08 PM, Brian Wolff bawolff@gmail.com wrote:
Hi Brian,
We already have the page lang support.
What do you mean by that? AFAICT there's no existing designated place in the schema for associating a content language with a specific page.
There's nothing in the schema. But we do (since 1.19) have a notion of "page language" separate from content language. See the
PageContentLanguage
hook and related code. So all that needs to be done is add something to
the
schema to actually store a value.
That, and a way for the user to specify that language. Either through a magic word or through a language selector on the editing page (either ULS or a dropdown). Of course, the wiki language should be the default. Is there anything else to it?
That's a as far as a page language goes; It is also useful to specify the language of chunks of a page. Currently it's done with the HTML lang attribute, either raw or through templates. It should probably be done using the VisualEditor, and I already wrote a general spec for it a while ago:
https://www.mediawiki.org/wiki/VisualEditor/Internationalization_requirement...
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Im personally opposed to using a magic word for this. Having a magic word that changes how magic words encountered prior to it are interpreted, not to mention changing how it itself is interpreted seems to be just asking for trouble.
-bawolff
I also prefer not to add magic words in the first place. But if it is added, it shouldn't change the way the other magic words are interpreted; only magic words from the wiki's language should be used.
Or no magic words at all - just use a language picker at the editing page.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2013/4/25 Brian Wolff bawolff@gmail.com:
On 2013-04-25 9:12 AM, "Amir E. Aharoni" amir.aharoni@mail.huji.ac.il wrote:
2013/4/25 Brian Wolff bawolff@gmail.com:
On 2013-04-25 7:04 AM, "Erik Moeller" erik@wikimedia.org wrote:
On Tue, Apr 23, 2013 at 9:08 PM, Brian Wolff bawolff@gmail.com wrote:
Hi Brian,
We already have the page lang support.
What do you mean by that? AFAICT there's no existing designated place in the schema for associating a content language with a specific page.
There's nothing in the schema. But we do (since 1.19) have a notion of "page language" separate from content language. See the
PageContentLanguage
hook and related code. So all that needs to be done is add something to
the
schema to actually store a value.
That, and a way for the user to specify that language. Either through a magic word or through a language selector on the editing page (either ULS or a dropdown). Of course, the wiki language should be the default. Is there anything else to it?
That's a as far as a page language goes; It is also useful to specify the language of chunks of a page. Currently it's done with the HTML lang attribute, either raw or through templates. It should probably be done using the VisualEditor, and I already wrote a general spec for it a while ago:
https://www.mediawiki.org/wiki/VisualEditor/Internationalization_requirement...
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Im personally opposed to using a magic word for this. Having a magic word that changes how magic words encountered prior to it are interpreted, not to mention changing how it itself is interpreted seems to be just asking for trouble.
-bawolff _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Erik Moeller wrote:
I'd like to start a broader conversation about language support in MW core [...]
Mailing lists are good for conversation, but a lot of your e-mail was insightful notes that I want to make sure don't get lost. I hope you'll eventually put together an RFC (https://www.mediawiki.org/wiki/RFC) or equivalent.
[...]
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
Well, for Commons, clearly the answer is for everyone to write in glyphs. Wingdings, Webdings, that fancy new color Unicode that Apple has. Meta-Wiki, on the other hand, now that's a real problem. ;-)
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not sure I'd call what you're proposing a major architectural undertaking, though perhaps I'm defining a much narrower problem scope. Below is my take on where we are currently and where we should head with regard to page properties.
We need better page properties (metadata) support. A few years ago, a page_props table was added to MediaWiki:
* https://www.mediawiki.org/wiki/Manual:Page_props_table
Within the past year, MediaWiki core has seen the info action resuscitated and Special:PagesWithProp implemented:
* https://www.mediawiki.org/w/index.php?title=MediaWiki&action=info * https://www.mediawiki.org/wiki/Special:PagesWithProp
That is, a lot of the infrastructure needed to support a basic language property field already exists, in my mind.
However, where we currently fall short is providing a reasonable interface for adding or modifying page properties. Currently, we use the page text to set nearly any property, via magic words (e.g., __NEWSECTIONLINK__ or {{DISPLAYTITLE:}}). The obvious advantage to doing this is the accountability, transparency, and reversibility of using the same system that edits rely on (text table, revision table). The obvious disadvantage is that the input system is a giant textarea.
If we could design a sane interface for modifying page properties (such as display title and a default category sort key) that included logging and accountability and reversibility, adding page content language as an additional page property would be pretty trivial. (MediaWiki could even do neat tricks like take a hint from either the user interface language of the page creator or examine the page contents themselves to make an educated guess about the page content language.) And as a fallback, I believe every site already defines a site-wide content language (even Meta-Wiki and Commons). The info action can then report this information on a per-page basis and Special:PagesWithProp can allow lookups by page property (i.e., by page content language).
MZMcBride
I've already tried both using page properties to store page content language and modifying ContentHandler::getPageLanguage()[1]. In both cases parser worked in a different language scope and didn't process magic words written in a default wiki language (e.g. Russian [[Категория:Test]] wouldn't work on a German page; English had to be used in both pages). It's OK for a wiki with the English language as default, but if such multi-lingual wiki worked for years with German on board, and then you implement the above said, all pages in other languages wouldn't be parsed properly.
I couldn't achieve page content manipulations at the time of parsing (by means of magic words). It may be either me being one-eyed or the current parser design.
P.S. In page properties, I had to set the page properties through the command line. You have to make an Action^WSpecial Page for that. Also, it will need some sort of restriction policy to prevent vandalism.
-- [1] By determining the postfix (/en, /ru, /zh, etc.)
On Wed, Apr 24, 2013 at 8:00 AM, MZMcBride z@mzmcbride.com wrote:
Erik Moeller wrote:
I'd like to start a broader conversation about language support in MW core [...]
Mailing lists are good for conversation, but a lot of your e-mail was insightful notes that I want to make sure don't get lost. I hope you'll eventually put together an RFC (https://www.mediawiki.org/wiki/RFC) or equivalent.
[...]
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
Well, for Commons, clearly the answer is for everyone to write in glyphs. Wingdings, Webdings, that fancy new color Unicode that Apple has. Meta-Wiki, on the other hand, now that's a real problem. ;-)
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not sure I'd call what you're proposing a major architectural undertaking, though perhaps I'm defining a much narrower problem scope. Below is my take on where we are currently and where we should head with regard to page properties.
We need better page properties (metadata) support. A few years ago, a page_props table was added to MediaWiki:
Within the past year, MediaWiki core has seen the info action resuscitated and Special:PagesWithProp implemented:
- https://www.mediawiki.org/w/index.php?title=MediaWiki&action=info
- https://www.mediawiki.org/wiki/Special:PagesWithProp
That is, a lot of the infrastructure needed to support a basic language property field already exists, in my mind.
However, where we currently fall short is providing a reasonable interface for adding or modifying page properties. Currently, we use the page text to set nearly any property, via magic words (e.g., __NEWSECTIONLINK__ or {{DISPLAYTITLE:}}). The obvious advantage to doing this is the accountability, transparency, and reversibility of using the same system that edits rely on (text table, revision table). The obvious disadvantage is that the input system is a giant textarea.
If we could design a sane interface for modifying page properties (such as display title and a default category sort key) that included logging and accountability and reversibility, adding page content language as an additional page property would be pretty trivial. (MediaWiki could even do neat tricks like take a hint from either the user interface language of the page creator or examine the page contents themselves to make an educated guess about the page content language.) And as a fallback, I believe every site already defines a site-wide content language (even Meta-Wiki and Commons). The info action can then report this information on a per-page basis and Special:PagesWithProp can allow lookups by page property (i.e., by page content language).
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2013/4/24 Paul Selitskas p.selitskas@gmail.com:
I've already tried both using page properties to store page content language and modifying ContentHandler::getPageLanguage()[1]. In both cases parser worked in a different language scope and didn't process magic words written in a default wiki language (e.g. Russian [[Категория:Test]] wouldn't work on a German page; English had to be used in both pages). It's OK for a wiki with the English language as default, but if such multi-lingual wiki worked for years with German on board, and then you implement the above said, all pages in other languages wouldn't be parsed properly.
If I understand correctly, the Visual Editor should gradually eliminate the need for users to use magic words directly, as well as for stuff like [[Category:]] and #REDIRECT. It should all be done using a GUI eventually. So the need for localized magic words should disappear, too.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On 24 April 2013 06:28, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
2013/4/24 Paul Selitskas p.selitskas@gmail.com:
I've already tried both using page properties to store page content language and modifying ContentHandler::getPageLanguage()[1]. In both cases parser worked in a different language scope and didn't process magic words written in a default wiki language (e.g. Russian [[Категория:Test]] wouldn't work on a German page; English had to be used in both pages). It's OK for a wiki with the English language as default, but if such multi-lingual wiki worked for years with German on board, and then you implement the above said, all pages in other languages wouldn't be parsed properly.
If I understand correctly, the Visual Editor should gradually eliminate the need for users to use magic words directly, as well as for stuff like [[Category:]] and #REDIRECT. It should all be done using a GUI eventually. So the need for localized magic words should disappear, too.
This is correct; a "Page Settings" (meta-data) dialog is coming soon to a VisualEditor near you, initially with just Categories, but longer-term all behavioural magic words, language links and any other meta-data people can think of will be there. This will mean that users will not be surprised to find a mysterious "__NOGALLERY__" and wonder what it does; there will be a place to describe what it does in their user-display language. The need for multi-lingual magic words in the same context will thus fade (though as we're planning for side-by-side wikitext and VisualEditor editing, there may still be some demand).
Of course, this only solves the problem for Wikimedia and other people happy to run a Parsoid service alongside MediaWiki. We have a general plan to build out a "no wikitext ever, just store HTML+RDFa" MediaWiki option, so only "legacy" sites would need Parsoid (and if you were willing to convert your storage from wikitext to HTML, not even that), but this is a lower priority than getting everything working. :-)
J. -- James D. Forrester Product Manager, VisualEditor Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
On Tue, Apr 23, 2013 at 10:00 PM, MZMcBride z@mzmcbride.com wrote:
I'm not sure I'd call what you're proposing a major architectural undertaking, though perhaps I'm defining a much narrower problem scope.
Yeah. A lot depends on whether or not we want language to be a first class citizen at the same level as a namespace throughout MediaWiki, for an installation that contains multiple languages. So for example, should various special pages that currently offer namespace filters also offer language filters? Should page uniqueness be constrained by title, namespace and language, as opposed to title and namespace as it is today?
One could make the case that not offering a lot of filtering by language is OK for multilingual wikis, since one of the conscious choices when setting up a wiki that way is that languages are precisely not going to be segregated, and the boundaries between language content are going to be fairly fluid compared with, say, the setup used for Wikipedia. I do think it's worth talking about the user experience benefits of either approach, but clearly a fair bit could be achieved by just improving the user experience around the most basic interactions in navigation and page creation.
Still, at a most basic level, it'd be nice to have at least a standard approach for title disambiguation, so folks don't have to manually figure out how to distinguish the Spanish "Portada" from the Catalan "Portada" every time that type of issue arises. The common approach to just pick "English word/language suffix" has its own issues, so perhaps the software could intelligently follow a standard disambiguation convention, e.g. adding a suffix but only if required.
Erik
I think ContentHandler already theoretically has the ability to store per-page language info, it's just not being used. (And of course it'd have to be actually deployed somewhere else than Wikidata.) Unless I'm missing something, this mostly needs an interface (which is not a small undertaking by any means, either).
Just to add, ContentHandler is deployed on all Wikimedia projects.
2013/4/24 Bartosz Dziewoński matma.rex@gmail.com
I think ContentHandler already theoretically has the ability to store per-page language info, it's just not being used. (And of course it'd have to be actually deployed somewhere else than Wikidata.) Unless I'm missing something, this mostly needs an interface (which is not a small undertaking by any means, either).
-- -- Matma Rex
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, 24 Apr 2013 12:53:50 +0200, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
Just to add, ContentHandler is deployed on all Wikimedia projects.
But with $wgContentHandlerUseDB = false, so not really.
On 04/23/2013 11:29 PM, Erik Moeller wrote:
(Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
My two currency subunits:
A wikidata-like approach seems like the only sensical approach to the problem IMO; that is, the concept of a 'page (read: data item)' should be language neutral and branch off in a set of "real" pages with their own title and language information.
"metapage" X would have an enumeration of representations in different languages, each with their own localized title(s) and contents. This way, given any such page, the actual information needed to switch between languages and handle language-specific presentation is immediately available. Categories would need no magical handling, that category Y is named "Images of dogs" in English and "Imágenes de perros" in Spanish is just part of the normal structure.
Add to this a simple user preference of language ordering for when "their" language is unavailable, and you have a good framework.
All that'd be left is... UI. :-)
-- Marc
Hoi, One reason to identify a language is to exclude it from being considered part of another language. What follows is that a single string can and should be identified as not being the base language for an article. Functionally there are great reasons why you want to do this including providing webfonts for languages like Batak, Burmese etc.
A Wikidata approach makes consequently no sense at all. Thanks, GerardM
On 24 April 2013 19:44, Marc A. Pelletier marc@uberbox.org wrote:
On 04/23/2013 11:29 PM, Erik Moeller wrote:
(Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
My two currency subunits:
A wikidata-like approach seems like the only sensical approach to the problem IMO; that is, the concept of a 'page (read: data item)' should be language neutral and branch off in a set of "real" pages with their own title and language information.
"metapage" X would have an enumeration of representations in different languages, each with their own localized title(s) and contents. This way, given any such page, the actual information needed to switch between languages and handle language-specific presentation is immediately available. Categories would need no magical handling, that category Y is named "Images of dogs" in English and "Imágenes de perros" in Spanish is just part of the normal structure.
Add to this a simple user preference of language ordering for when "their" language is unavailable, and you have a good framework.
All that'd be left is... UI. :-)
-- Marc
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
many thanks for this proposal, erik! what i would love to be considered in this context as well would be "native language". to give an example:
i am speaking english and german. therefor i like to read the contents in the original version, as long as it is available in this language. e.g. wmch s bylaws are in 5 languages, the authoritative version is german as it is registered in zürich. most other texts on our wiki are written in english. so i d love to get these pages in english, and the bylaws in german.
rupert Am 24.04.2013 05:30 schrieb "Erik Moeller" erik@wikimedia.org:
Hi folks,
I'd like to start a broader conversation about language support in MW core, and the potential need to re-think some pretty fundamental design decisions in MediaWiki if we want to move past the point of diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a single wiki is mostly handled through JS hacks, templates, and manual markup added to the content (such as <div>s indicating language direction), we are providing an opaque, confusing and often inconsistent user experience in our multilingual wikis, which is a major impediment for growth of non-English content in those wikis, and participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors, and they certainly are (since Commons categories are largely in English, they are by definition excluding folks who don't speak the language), but I'd like to focus on the non-category parts of the problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
- There's no consistent method by which multiple language editions of
the same page are surfaced for selection by the use. Different wikis use different templates (often multiple variants and layouts in a single wiki), different positioning, different rules, etc., leading to inconsistent user experience. Consistency is offered by language headers generated by the Translate extension, but these are used for managing translations, while multilingual content existing in the same wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained, consider the user-friendliness of something like the +/- link in the language header on a page like https://commons.wikimedia.org/wiki/Commons:Kooperationen which leads to:
https://commons.wikimedia.org/w/index.php?title=Template:Lang-Partnerships&a...
Chances are that a lot of people who'd have the ability to provide a version (not necessarily a translation) of the page in a given language will give up even on the process of doing so correctly.
- There's no consistent method by which page name conflicts (which
may often occur in similar languages) are resolved, and users have to manually disambiguate.
- There are basic UX issues in the language selection tools offered
today. For example, after changing the language on Commons to German, I will see the page I'm on (say English) with a German user interface, even if there's an actual German content version of the page available. This is because these language selection tools have no awareness of the existence of content in relevant languages.
- In order to ensure that content is rendered correctly irrespective
of the UI language set, we require content authors to manually add
<div>s around RTL content, even if that's all the page contains.
- It's impossible to restrict searches to a specific language. It's
impossible to restrict recent changes and similar tools to a specific language.
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
This is a very tricky set of architectural issues to solve well, and it would be easy to make the user experience worse by solving it poorly. Still, as we grow our bench strength to take on hard problems, I want to raise the temperature of this problem a bit again, especially from the standpoint of future platform engineering improvements.
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not suggesting this should be done in the very near term, but I'd like to at least start talking about it, hear if I'm completely off base (and if there are simpler ways to improve on current state), and explore where it could fit in our longer term agenda.
Relevant existing code:
- https://www.mediawiki.org/wiki/Extension:Translate - awesome for
page and message translation, but I'm not clear that it can help for the other multilingual content scenarios and problems
- Others:
https://www.mediawiki.org/wiki/Category:Internationalization_extensions
Thanks, Erik
-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every month. The world population is estimated to be >7 billion. Still a long way to go. Support us. Join us. Share: https://wikimediafoundation.org/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org