I'm not sure about the exact number either, but in general it is true. There is a bug about this already reported in Wikidata: https://bugzilla.wikimedia.org/show_bug.cgi?id=36635
I can't think of any good way to fix it except fixing bug 28970 in core MediaWiki: to allow setting the language of a page and of a page's title separately. Currently the language of a page and of the title is always assumed to be the same as the content language of the wiki. The only solutions to this problem are to use {{displaytitle}} or <div lang="en" dir="ltr">, but these solutions are very incomplete. (The Translate extension has a better solution, but it only works for translatable pages.)
This is also needed for many other things that are not related to Wikidata, such as correct display of titles in category pages. It is also discussed in the Visual Editor i18n requirements (disclaimer: I wrote them): https://www.mediawiki.org/wiki/VisualEditor/Internationalization_requirement...
I wanted to start an RFC about this issue on wikitech-l for a while now, so maybe now is a good time to finally do it.
-- Amir
2012/8/13 Denny Vrandečić denny.vrandecic@wikimedia.de:
Hi Amir,
thanks for the bug report! We will go and implement it along the lines you suggest it.
I have one question still, maybe you can help me:
About 10% of the titles in the Hebrew Wikipedia are in latin alphabet (very rough estimate, may be completely off, based of a glance on Special:Allpages). So an article like http://he.wikipedia.org/wiki/Yesterday, where the title would be LTR, would be declared as RTL. Is there a way to avoid that?
I guess the answer is no, but I wanted to ask.
Cheers, Denny
2012/8/11 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Hallo,
It's my first email on this list, so in case you don't know me: I am Amir, I'm from Israel, I'm a wikipedian since 2004, I write mostly in Hebrew and English, I care strongly about language issues in software in general and about right-to-left support in particular, and I work in the WMF's localization team.
Now, about the subject: you probably know that i18n is "internationalization" and "l10n" is "localization". "m17n" is a less common term, which means "multilingualization" - making software able to work in many languages at once. This email is about one of the easiest and the most important ways to make Wikidata support many languages on one page everywhere.
I've been testing the Wikidata demo for a few days now, with the aim of getting it deployed in the Hebrew Wikipedia very soon. The first thing that I noticed is that even though everybody understands that Wikidata is supposed to be massively multilingual, little or no use is made of the lang and dir attributes in the HTML that Wikidata generates. The most immediate example is http://wikidata-test-repo.wikimedia.de/wiki/Data:Q2?uselang=en
It basically lists the word "Helium" in many languages, but as far as the browser is concerned, almost all of it is written in English, because the root <html> element says lang="en". The only exceptions are the interlanguage links in the sidebar, where the lang attributes are user properly, but that's a regular MediaWiki feature.
It is very much needed to explicitly specify the lang attribute and also the dir attribute (direction: "ltr" or "rtl") on every element, the content language of which is known to be different from the content language of the enclosing element. Many developers may think that this attribute doesn't do anything, but actually it does a lot:
- correct text-to-speech and speech-to-text handling
- correct font rendering (relevant for Serbian [1], for some languages
of India etc.)
- selecting the correct spell checking dictionary
- selecting the right language for machine translation
- adjusting the line-height
- selecting the web font (in MediaWiki's WebFonts extension)
- etc.
So please, use it whenever you can.
Always use the dir attribute in these circumstances, too. It must be specified explicitly even though "ltr" is the default, because if the user interface is right-to-left, it will propagate to elements in other languages, too, so you would right-to-left English. (I consider this a bug in the HTML standard... but it's a topic for a different email).
In the case of the page that I mentioned above, it should be quite trivial to fix, because MediaWiki's Language class provides very easy functions for this. I also opened bug 39257 [2] about it. I am repeating it here on the mailing list, just to say to the developers to do it everywhere. If you are a developer and you run into any problems with using these attributes, please contact in any way that is convenient to you.
Thank you!
[1] See https://sr.wikipedia.org/wiki/User:Amire80 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=39257
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l