How best to anticipate and plan here for ever more accurate translation between Wikipedia / Wikidata languages with full STEM precision? What's the road map? How might this "Unit Localization" Phabricator RFC https://phabricator.wikimedia.org/T86528 fit into a series of Phabricator RFCs in a longer term plan for great Wikidata translation? Can we further begin to lay out this "road map" at this stage for all of Wikipedia's 358 languages (and anticipate even all 7,943 language entries in Glottolog)?
Would it be possible to dovetail this with developing Wiktionary with Content Translation as Phabricator RFCs?
(WUaS which donated CC WUaS to CC Wikidata last autumn would like to help develop such translation and for CC MIT OCW in 7 languages and CC Yale OYC, for example, in addition to MediaWiki Content Translation).
Scott
On Jul 28, 2016 3:27 AM, "Lydia Pintscher" lydia.pintscher@wikimedia.de wrote:
On Wed, Jul 27, 2016 at 9:18 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1]
http://english.stackexchange.com/questions/22082/are-units-in-english-singul...
[2]
https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%...
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata