I agree with Erik that multi-language support on multi-language projects like Commons is
very messy, complicated and inconsistent. The system has morphed into a web of java
scripts and templates designed and maintained by many users that know only small section
of the whole system. For example, I do a lot of template maintenance and
internationalization (i18n), however have very little understanding of poorly documented
java scripts used (which interact with some templates) or the interactions between
translatewiki (where many translations are made) and Commons (see ).
One of the challenges is that many issues experienced by some users in one language are
not experienced by others. For example, since many templates on Commons are very close to
the template expansion limit, the limit is often crossed in one language but not in the
other. (Hopefully that will be solved by rewriting some of the templates in Lua.). Also
there is a very different functionality for logged in and not logged in users. For example
language links on the bottom of some templates, like [[Template:Delete]]  work for not
logged in users but do not do anything if you are logged in.
Another huge challenge of current and future systems is that Erik already pointed out:
that many translations are not 1:1. People are often adding corrections to text in
languages they know, so slowly different language versions are drifting apart. For
example, I lately noticed that some significant changes to template:PD-Polish  did not
make it to any of the other versions, so different people see different license template.
The only solution for this I can think of is some sort of marking the of the text to
highlight out-of-date translations and provide also up-to-date version in other
Whatever system we use should allow two forms of i18n used: macro (where whole pages or
large sections are translated as a whole) and micro ( where individual words, phrases or
sentences are translated). Also since Commons mostly deal with images, a lot of translated
content is image metadata like technique used to create an artwork or the century the
creator of the artwork lived in. This type of metadata can be handled by
language-independent properties like the ones used at wikidata (see ).
I see that there will be a scheduled talk about Extension:Translate and Commons at
Wikimania 2013 .
Date: Tue, 23 Apr 2013 20:29:49 -0700
From: Erik Moeller <erik(a)wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Subject: [Wikitech-l] Support for multiple content languages in MW
Content-Type: text/plain; charset=ISO-8859-1
I'd like to start a broader conversation about language support in MW
core, and the potential need to re-think some pretty fundamental
design decisions in MediaWiki if we want to move past the point of
diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content
languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a
single wiki is mostly handled through JS hacks, templates, and manual
markup added to the content (such as <div>s indicating language
direction), we are providing an opaque, confusing and often
inconsistent user experience in our multilingual wikis, which is a
major impediment for growth of non-English content in those wikis, and
participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors,
and they certainly are (since Commons categories are largely in
English, they are by definition excluding folks who don't speak the
language), but I'd like to focus on the non-category parts of the
problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
1) There's no consistent method by which multiple language editions of
the same page are surfaced for selection by the use. Different wikis
use different templates (often multiple variants and layouts in a
single wiki), different positioning, different rules, etc., leading to
inconsistent user experience. Consistency is offered by language
headers generated by the Translate extension, but these are used for
managing translations, while multilingual content existing in the same
wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained,
consider the user-friendliness of something like the +/- link in the
language header on a page like
which leads to:
Chances are that a lot of people who'd have the ability to provide a
version (not necessarily a translation) of the page in a given
language will give up even on the process of doing so correctly.
2) There's no consistent method by which page name conflicts (which
may often occur in similar languages) are resolved, and users have to
3) There are basic UX issues in the language selection tools offered
today. For example, after changing the language on Commons to German,
I will see the page I'm on (say English) with a German user interface,
even if there's an actual German content version of the page
available. This is because these language selection tools have no
awareness of the existence of content in relevant languages.
4) In order to ensure that content is rendered correctly irrespective
of the UI language set, we require content authors to manually add
<div>s around RTL content, even if that's all the page contains.
5) It's impossible to restrict searches to a specific language. It's
impossible to restrict recent changes and similar tools to a specific
I'll stop there - I'm sure you can think of other issues with the
current approach. For third party users, the effort of replicating
something like the semi-acceptable Commons or Meta user experience is
pretty significant, as well, due to the large number of templates and
local hacks employed.
This is a very tricky set of architectural issues to solve well, and
it would be easy to make the user experience worse by solving it
poorly. Still, as we grow our bench strength to take on hard problems,
I want to raise the temperature of this problem a bit again,
especially from the standpoint of future platform engineering
Would it make sense to add a language property to pages, so it can be
used to solve a lot of the above issues, and provide appropriate and
consistent user experience built on them? (Keeping in mind that some
pages would be multilingual and would need to be identified as such.)
If so, this seems like a major architectural undertaking that should
only be taken on as a partnership between domain experts (site and
platform architecture, language engineering, Visual Editor/Parsoid,
I'm not suggesting this should be done in the very near term, but I'd
like to at least start talking about it, hear if I'm completely off
base (and if there are simpler ways to improve on current state), and
explore where it could fit in our longer term agenda.
Relevant existing code:
- awesome for
page and message translation, but I'm not clear that it can help for
the other multilingual content scenarios and problems
* Others: https://www.mediawiki.org/wiki/Category:Internationalization_extensions
VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every
month. The world population is estimated to be >7 billion. Still a long
way to go. Support us. Join us. Share: https://wikimediafoundation.org/