I agree with Erik that multi-language support on multi-language projects like Commons is very messy, complicated and inconsistent. The system has morphed into a web of java scripts and templates designed and maintained by many users that know only small section of the whole system. For example, I do a lot of template maintenance and internationalization (i18n), however have very little understanding of poorly documented java scripts[3] used (which interact with some templates) or the interactions between translatewiki (where many translations are made) and Commons (see [2]).
One of the challenges is that many issues experienced by some users in one language are not experienced by others. For example, since many templates on Commons are very close to the template expansion limit, the limit is often crossed in one language but not in the other. (Hopefully that will be solved by rewriting some of the templates in Lua.). Also there is a very different functionality for logged in and not logged in users. For example language links on the bottom of some templates, like [[Template:Delete]] [4] work for not logged in users but do not do anything if you are logged in.
Another huge challenge of current and future systems is that Erik already pointed out: that many translations are not 1:1. People are often adding corrections to text in languages they know, so slowly different language versions are drifting apart. For example, I lately noticed that some significant changes to template:PD-Polish [5] did not make it to any of the other versions, so different people see different license template. The only solution for this I can think of is some sort of marking the of the text to highlight out-of-date translations and provide also up-to-date version in other language.
Whatever system we use should allow two forms of i18n used: macro (where whole pages or large sections are translated as a whole) and micro ( where individual words, phrases or sentences are translated). Also since Commons mostly deal with images, a lot of translated content is image metadata like technique used to create an artwork or the century the creator of the artwork lived in. This type of metadata can be handled by language-independent properties like the ones used at wikidata (see [6]).
I see that there will be a scheduled talk about Extension:Translate and Commons at Wikimania 2013 [1].
[1] http://wikimania2013.wikimedia.org/wiki/Submissions/Multilingual_Wikimedia_C... [2] https://commons.wikimedia.org/wiki/User:Multichill/Template_i18n_at_Translat... [3] https://commons.wikimedia.org/wiki/MediaWiki_talk:Multilingual_description.j... [4] https://commons.wikimedia.org/wiki/Template:Delete [5] https://commons.wikimedia.org/w/index.php?title=Template%3APD-Polish%2Fen&am... [6] http://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/03#Wikidata_...
Jarek T. User:jarekt
Date: Tue, 23 Apr 2013 20:29:49 -0700 From: Erik Moeller erik@wikimedia.org To: Wikimedia developers wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] Support for multiple content languages in MW core Message-ID: CAEg6ZHmcSU=M8w2314EbsLZ2ZWTYgyzSLpwBqWbyw9nauo=WLA@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
Hi folks,
I'd like to start a broader conversation about language support in MW core, and the potential need to re-think some pretty fundamental design decisions in MediaWiki if we want to move past the point of diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a single wiki is mostly handled through JS hacks, templates, and manual markup added to the content (such as <div>s indicating language direction), we are providing an opaque, confusing and often inconsistent user experience in our multilingual wikis, which is a major impediment for growth of non-English content in those wikis, and participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors, and they certainly are (since Commons categories are largely in English, they are by definition excluding folks who don't speak the language), but I'd like to focus on the non-category parts of the problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
1) There's no consistent method by which multiple language editions of the same page are surfaced for selection by the use. Different wikis use different templates (often multiple variants and layouts in a single wiki), different positioning, different rules, etc., leading to inconsistent user experience. Consistency is offered by language headers generated by the Translate extension, but these are used for managing translations, while multilingual content existing in the same wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained, consider the user-friendliness of something like the +/- link in the language header on a page like https://commons.wikimedia.org/wiki/Commons:Kooperationen which leads to: https://commons.wikimedia.org/w/index.php?title=Template:Lang-Partnerships&a...
Chances are that a lot of people who'd have the ability to provide a version (not necessarily a translation) of the page in a given language will give up even on the process of doing so correctly.
2) There's no consistent method by which page name conflicts (which may often occur in similar languages) are resolved, and users have to manually disambiguate.
3) There are basic UX issues in the language selection tools offered today. For example, after changing the language on Commons to German, I will see the page I'm on (say English) with a German user interface, even if there's an actual German content version of the page available. This is because these language selection tools have no awareness of the existence of content in relevant languages.
4) In order to ensure that content is rendered correctly irrespective of the UI language set, we require content authors to manually add <div>s around RTL content, even if that's all the page contains.
5) It's impossible to restrict searches to a specific language. It's impossible to restrict recent changes and similar tools to a specific language.
I'll stop there - I'm sure you can think of other issues with the current approach. For third party users, the effort of replicating something like the semi-acceptable Commons or Meta user experience is pretty significant, as well, due to the large number of templates and local hacks employed.
This is a very tricky set of architectural issues to solve well, and it would be easy to make the user experience worse by solving it poorly. Still, as we grow our bench strength to take on hard problems, I want to raise the temperature of this problem a bit again, especially from the standpoint of future platform engineering improvements.
Would it make sense to add a language property to pages, so it can be used to solve a lot of the above issues, and provide appropriate and consistent user experience built on them? (Keeping in mind that some pages would be multilingual and would need to be identified as such.) If so, this seems like a major architectural undertaking that should only be taken on as a partnership between domain experts (site and platform architecture, language engineering, Visual Editor/Parsoid, etc.).
I'm not suggesting this should be done in the very near term, but I'd like to at least start talking about it, hear if I'm completely off base (and if there are simpler ways to improve on current state), and explore where it could fit in our longer term agenda.
Relevant existing code:
* https://www.mediawiki.org/wiki/Extension:Translate - awesome for page and message translation, but I'm not clear that it can help for the other multilingual content scenarios and problems
* Others: https://www.mediawiki.org/wiki/Category:Internationalization_extensions
Thanks, Erik
-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every month. The world population is estimated to be >7 billion. Still a long way to go. Support us. Join us. Share: https://wikimediafoundation.org/
wikitech-l@lists.wikimedia.org