many thanks for this proposal, erik! what i would love to be considered in
this context as well would be "native language". to give an example:
i am speaking english and german. therefor i like to read the contents in
the original version, as long as it is available in this language. e.g.
wmch s bylaws are in 5 languages, the authoritative version is german as it
is registered in zürich. most other texts on our wiki are written in
english. so i d love to get these pages in english, and the bylaws in
Am 24.04.2013 05:30 schrieb "Erik Moeller" <erik(a)wikimedia.org>rg>:
I'd like to start a broader conversation about language support in MW
core, and the potential need to re-think some pretty fundamental
design decisions in MediaWiki if we want to move past the point of
diminishing returns in some language-related improvements.
In a nutshell, is it time to make MW aware of multiple content
languages in a single wiki? If so, how would we go about it?
Hypothesis: Because support for multiple languages existing in a
single wiki is mostly handled through JS hacks, templates, and manual
markup added to the content (such as <div>s indicating language
direction), we are providing an opaque, confusing and often
inconsistent user experience in our multilingual wikis, which is a
major impediment for growth of non-English content in those wikis, and
participation by contributors who are not English speakers.
Categories have long been called out as one of the biggest factors,
and they certainly are (since Commons categories are largely in
English, they are by definition excluding folks who don't speak the
language), but I'd like to focus on the non-category parts of the
problem for the purposes of this conversation.
Support for the hypothesis (please correct misconceptions or errors):
1) There's no consistent method by which multiple language editions of
the same page are surfaced for selection by the use. Different wikis
use different templates (often multiple variants and layouts in a
single wiki), different positioning, different rules, etc., leading to
inconsistent user experience. Consistency is offered by language
headers generated by the Translate extension, but these are used for
managing translations, while multilingual content existing in the same
wiki may often not take the form of 1:1 translations.
Moreover, language headers have to be manually updated/maintained,
consider the user-friendliness of something like the +/- link in the
language header on a page like
which leads to:
Chances are that a lot of people who'd have the ability to provide a
version (not necessarily a translation) of the page in a given
language will give up even on the process of doing so correctly.
2) There's no consistent method by which page name conflicts (which
may often occur in similar languages) are resolved, and users have to
3) There are basic UX issues in the language selection tools offered
today. For example, after changing the language on Commons to German,
I will see the page I'm on (say English) with a German user interface,
even if there's an actual German content version of the page
available. This is because these language selection tools have no
awareness of the existence of content in relevant languages.
4) In order to ensure that content is rendered correctly irrespective
of the UI language set, we require content authors to manually add
<div>s around RTL content, even if that's all the page contains.
5) It's impossible to restrict searches to a specific language. It's
impossible to restrict recent changes and similar tools to a specific
I'll stop there - I'm sure you can think of other issues with the
current approach. For third party users, the effort of replicating
something like the semi-acceptable Commons or Meta user experience is
pretty significant, as well, due to the large number of templates and
local hacks employed.
This is a very tricky set of architectural issues to solve well, and
it would be easy to make the user experience worse by solving it
poorly. Still, as we grow our bench strength to take on hard problems,
I want to raise the temperature of this problem a bit again,
especially from the standpoint of future platform engineering
Would it make sense to add a language property to pages, so it can be
used to solve a lot of the above issues, and provide appropriate and
consistent user experience built on them? (Keeping in mind that some
pages would be multilingual and would need to be identified as such.)
If so, this seems like a major architectural undertaking that should
only be taken on as a partnership between domain experts (site and
platform architecture, language engineering, Visual Editor/Parsoid,
I'm not suggesting this should be done in the very near term, but I'd
like to at least start talking about it, hear if I'm completely off
base (and if there are simpler ways to improve on current state), and
explore where it could fit in our longer term agenda.
Relevant existing code:
- awesome for
page and message translation, but I'm not clear that it can help for
the other multilingual content scenarios and problems
VP of Engineering and Product Development, Wikimedia Foundation
Wikipedia and our other projects reach more than 500 million people every
month. The world population is estimated to be >7 billion. Still a long
way to go. Support us. Join us. Share: https://wikimediafoundation.org/
Wikitech-l mailing list