How best to anticipate and plan here for ever more accurate translation
between Wikipedia / Wikidata languages with full STEM precision? What's the
road map? How might this "Unit Localization" Phabricator RFC
fit into a series of Phabricator
RFCs in a longer term plan for great Wikidata translation? Can we further
begin to lay out this "road map" at this stage for all of Wikipedia's 358
languages (and anticipate even all 7,943 language entries in Glottolog)?
Would it be possible to dovetail this with developing Wiktionary with
Content Translation as Phabricator RFCs?
(WUaS which donated CC WUaS to CC Wikidata last autumn would like to help
develop such translation and for CC MIT OCW in 7 languages and CC Yale OYC,
for example, in addition to MediaWiki Content Translation).
Scott
On Jul 28, 2016 3:27 AM, "Lydia Pintscher" <lydia.pintscher(a)wikimedia.de>
wrote:
On Wed, Jul 27, 2016 at 9:18 PM, Stas Malyshev
<smalyshev(a)wikimedia.org>
wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to
the number. While it gives the idea of what is going on, it is somewhat
ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in
other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр -
instead of the correct "83 килограмма", "185 сантиметров". For some
units, the norms are kind of tricky and fluid (e.g. see [2]), and they
are not even identical across all units in the same language, but the
common theme is that there are grammatical rules on how to do it and
we're ignoring them right now.
I think we do have some means to grammatically display numbers - for
example, number of references is displayed correctly in English and
Russian. As I understand, it is done by using certain formats in message
strings, and these formats are supported in the code in Language
classes. So, I wonder if we should maybe have an (optional) property
that defines the same format for units? We could then reuse the same
code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of
centimetre - and then plurals are not required. However, this relies on
units having short names, and for some units short names can be rather
obscure, and maybe in some language short names need grammatical forms
too. Given that we do not link unit names, it would be rather confusing
(btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they
usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we
may not even have data on which language uses which script, in a useful
form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this
somewhere? I looked but couldn't find it.
[1]
http://english.stackexchange.com/questions/22082/are-units-in-english-singu…
[2]
https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0…
The discussion about how to do this is happening in
https://phabricator.wikimedia.org/T86528 The basic problem is that we
do use items for the units. I think this is the right thing to do but
it does make this particular part a bit tricky.
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata