Hi folks,
I'd like to start a broader conversation about language support in MW core, and the potential need to re-think some pretty fundamental design decisions in MediaWiki if we want to move past the point of diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a single wiki is mostly handled through JS hacks, templates, and manual markup added to the content (such as <div>s indicating language direction), we are providing an opaque, confusing and often inconsistent user experience in our multilingual wikis, which is a major impediment for growth of non-English content in those wikis, and participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors, and they certainly are (since Commons categories are largely in English, they are by definition excluding folks who don't speak the language), but I'd like to focus on the non-category parts of the problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
1) There's no consistent method by which multiple language editions of the same page are surfaced for selection by the use. Different wikis use different templates (often multiple variants and layouts in a single wiki), different positioning, different rules, etc., leading to inconsistent user experience. Consistency is offered by language headers generated by the Translate extension, but these are used for managing translations, while multilingual content existing in the same wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained, consider the user-friendliness of something like the +/- link in the language header on a page like https://commons.wikimedia.org/wiki/Commons:Kooperationen which leads to: https://commons.wikimedia.org/w/index.php?title=Template:Lang-Partnerships&a...
Chances are that a lot of people who'd have the ability to provide a version (not necessarily a translation) of the page in a given language will give up even on the process of doing so correctly.
2) There's no consistent method by which page name conflicts (which may often occur in similar languages) are resolved, and users have to manually disambiguate.
3) There are basic UX issues in the language selection tools offered today. For example, after changing the language on Commons to German, I will see the page I'm on (say English) with a German user interface, even if there's an actual German content version of the page available. This is because these language selection tools have no awareness of the existence of content in relevant languages.
4) In order to ensure that content is rendered correctly irrespective of the UI language set, we require content authors to manually add <div>s around RTL content, even if that's all the page contains.
5) It's impossible to restrict searches to a specific language. It's impossible to restrict recent changes and similar tools to a specific language.
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
This is a very tricky set of architectural issues to solve well, and it would be easy to make the user experience worse by solving it poorly. Still, as we grow our bench strength to take on hard problems, I want to raise the temperature of this problem a bit again, especially from the standpoint of future platform engineering improvements.
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not suggesting this should be done in the very near term, but I'd like to at least start talking about it, hear if I'm completely off base (and if there are simpler ways to improve on current state), and explore where it could fit in our longer term agenda.
Relevant existing code:
* https://www.mediawiki.org/wiki/Extension:Translate - awesome for page and message translation, but I'm not clear that it can help for the other multilingual content scenarios and problems
* Others: https://www.mediawiki.org/wiki/Category:Internationalization_extensions
Thanks, Erik
-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every month. The world population is estimated to be >7 billion. Still a long way to go. Support us. Join us. Share: https://wikimediafoundation.org/