Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1] http://english.stackexchange.com/questions/22082/are-units-in-english-singul... [2] https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%... [3] https://phabricator.wikimedia.org/T86528
Where are the names of those units translated at the moment?
If these are MediaWiki messages, grammar rules for them can be added fairly easily. If I can see where they are now, I could probably make a quite demo patch to show how it can be done.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2016-07-27 22:18 GMT+03:00 Stas Malyshev smalyshev@wikimedia.org:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1]
http://english.stackexchange.com/questions/22082/are-units-in-english-singul... [2]
https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%... [3] https://phabricator.wikimedia.org/T86528 -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
Where are the names of those units translated at the moment?
I assume on the wikidata items for them, those are just labels for wikidata items (as units are items).
If these are MediaWiki messages, grammar rules for them can be added fairly easily. If I can see where they are now, I could probably make a quite demo patch to show how it can be done.
I don't think we can put grammar rules in labels, that's why I proposed a special property as an option.
Hoi, The problem is that Wikidata does not support lexical attributies. Once it does it will be resolved. Until that time it and similar issues will not go away. Thanks, Gerard
On 27 July 2016 at 22:07, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Where are the names of those units translated at the moment?
I assume on the wikidata items for them, those are just labels for wikidata items (as units are items).
If these are MediaWiki messages, grammar rules for them can be added fairly easily. If I can see where they are now, I could probably make a quite demo patch to show how it can be done.
I don't think we can put grammar rules in labels, that's why I proposed a special property as an option.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Stas,
Good point. Could we not just have a monolingual text string property that gives the preferred writing of the unit when used after a number? I don't think the plural/singular issue is very problematic, since you would have plural almost everywhere, even for "1.0 metres". So maybe we just need one alternative label for most languages? Or are there languages with more complex grammar rules for units?
Best regards,
Markus
On 27.07.2016 21:18, Stas Malyshev wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1] http://english.stackexchange.com/questions/22082/are-units-in-english-singul... [2] https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%... [3] https://phabricator.wikimedia.org/T86528
2016-07-28 9:21 GMT+02:00 Markus Kroetzsch markus.kroetzsch@tu-dresden.de:
Or are there languages with more complex grammar rules for units?
Russian is pretty complicated, using singular nominative, singular genitive and plural genitive based on the number, as examples in the initial letter show.
Hi!
Good point. Could we not just have a monolingual text string property that gives the preferred writing of the unit when used after a number? I don't think the plural/singular issue is very problematic, since you would have plural almost everywhere, even for "1.0 metres". So maybe we
We have code to deal with that - note that "1 reference" and "2 references" are displayed properly. It's a matter of applying that code and having it provided with proper configs.
just need one alternative label for most languages? Or are there languages with more complex grammar rules for units?
Oh yes :) Russian is one, but I'm sure there are others.
2016-07-28 20:41 GMT+02:00 Stas Malyshev smalyshev@wikimedia.org:
just need one alternative label for most languages? Or are there languages with more complex grammar rules for units?
Oh yes :) Russian is one, but I'm sure there are others.
Actually, most of the slavic languages have more complex grammar rules
than simple singular x plural division.
On 28.07.2016 20:41, Stas Malyshev wrote:
Hi!
Good point. Could we not just have a monolingual text string property that gives the preferred writing of the unit when used after a number? I don't think the plural/singular issue is very problematic, since you would have plural almost everywhere, even for "1.0 metres". So maybe we
We have code to deal with that - note that "1 reference" and "2 references" are displayed properly. It's a matter of applying that code and having it provided with proper configs.
You mean the MediaWiki message processing code? This would probably be powerful enough for units as well, but it works based on message strings that look a bit like MW template calls. Someone has to enter such strings for all units (and languages). This would be doable but the added power comes at the price of more difficult editing of such message strings instead of plain labels.
As far as I know, the message parsing is available through the MW API, so external consumers could take advantage of the same system if the message strings were part of the data (we would like to have grammatical units in SQID as well).
just need one alternative label for most languages? Or are there languages with more complex grammar rules for units?
Oh yes :) Russian is one, but I'm sure there are others.
Forgive my ignorance; I was not able to read the example you gave there.
Markus
Hi!
You mean the MediaWiki message processing code? This would probably be
Yes, exactly.
powerful enough for units as well, but it works based on message strings that look a bit like MW template calls. Someone has to enter such strings for all units (and languages). This would be doable but the added power comes at the price of more difficult editing of such message strings instead of plain labels.
True. OTOH, we already have non-plain strings in the database - e.g. math formulae - so that would be another example of such strings. It's not ideal but would be a start, and maybe we can have some gadgets later to deal with it :)
Oh yes :) Russian is one, but I'm sure there are others.
Forgive my ignorance; I was not able to read the example you gave there.
Sorry, it's hard to give examples in foreign languages that would be comprehensible :) The gist of it is that Russian, as many other inflected languages, changes nouns by grammatical case, and uses different cases for different number of items (i.e. 1, 2, and 5 will use three different cases). Labels are of course in singular nominative case, which is wrong for many numbers.
My two cents : this is a job to do in conjunction with structured wiktionary, who will be able to deal with lexical entities.
We however have some properties here and there to deal with such languages issues to deal with this inside Wikidata, female form of occupation name for example, but the logic to deal with those datas is coded in the clients like the infoboxes.
Coding this inside Wikidata would still require a step that is far from reach imho : code a per language language grammatical model that would select some lexical forms considering the context ... Not that easy to do. What would be really cool eventually is for us to code those rules in a structured way. One open question would be "how the software would know thoses rules and which one to use in which context".
I'd suggest to do this as a javascript gadget as a first step to better understand what those rule may look like, where the main units are hardcoded, and to leave the logic code in this gadget.
2016-07-29 9:19 GMT+02:00 Stas Malyshev smalyshev@wikimedia.org:
Hi!
You mean the MediaWiki message processing code? This would probably be
Yes, exactly.
powerful enough for units as well, but it works based on message strings that look a bit like MW template calls. Someone has to enter such strings for all units (and languages). This would be doable but the added power comes at the price of more difficult editing of such message strings instead of plain labels.
True. OTOH, we already have non-plain strings in the database - e.g. math formulae - so that would be another example of such strings. It's not ideal but would be a start, and maybe we can have some gadgets later to deal with it :)
Oh yes :) Russian is one, but I'm sure there are others.
Forgive my ignorance; I was not able to read the example you gave there.
Sorry, it's hard to give examples in foreign languages that would be comprehensible :) The gist of it is that Russian, as many other inflected languages, changes nouns by grammatical case, and uses different cases for different number of items (i.e. 1, 2, and 5 will use three different cases). Labels are of course in singular nominative case, which is wrong for many numbers. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
2016-07-29 14:18 GMT+03:00 Thomas Douillard thomas.douillard@gmail.com:
My two cents : this is a job to do in conjunction with structured wiktionary, who will be able to deal with lexical entities.
Ideally, yes, but it will take us some time to get there.
We however have some properties here and there to deal with such languages issues to deal with this inside Wikidata, female form of occupation name for example, but the logic to deal with those datas is coded in the clients like the infoboxes.
Coding this inside Wikidata would still require a step that is far from reach imho : code a per language language grammatical model that would select some lexical forms considering the context ... Not that easy to do.
It's not so different from MediaWiki's usual {{GRAMMAR}} clauses, and at https://phabricator.wikimedia.org/T86528#2501684 I suggest a way to make it more easily extensible. As I note there, it's not the most robust way to do it, but it's a practical step towards something better, probably along the lines fo structured wiktionary.
In general this has more implications than simple singular/plural forms of units. Agreement/concord/congruence is the proper term. [1] In some language you will even change the form given the distance to the thing you are measuring or counting, even depending on the type of thing you are measuring or counting, or change on the gender of the thing, and then even only for some numbers.
Assume you have "1 meter", then you could write it out as "én meter" in Norwegian as "meter" is masculinum. Now assume you have "1 kilogram", then you would write it out as "ett kilogram" as "gram" is neutrum. Now assume "kilogram" is changed to the short form "kilo", then it is "én kilo" which is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or "én milli" when refering to "1 milligram".
[1] https://en.wikipedia.org/wiki/Agreement_(linguistics)
On Fri, Jul 29, 2016 at 7:26 AM, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 28.07.2016 20:41, Stas Malyshev wrote:
Hi!
Good point. Could we not just have a monolingual text string property
that gives the preferred writing of the unit when used after a number? I don't think the plural/singular issue is very problematic, since you would have plural almost everywhere, even for "1.0 metres". So maybe we
We have code to deal with that - note that "1 reference" and "2 references" are displayed properly. It's a matter of applying that code and having it provided with proper configs.
You mean the MediaWiki message processing code? This would probably be powerful enough for units as well, but it works based on message strings that look a bit like MW template calls. Someone has to enter such strings for all units (and languages). This would be doable but the added power comes at the price of more difficult editing of such message strings instead of plain labels.
As far as I know, the message parsing is available through the MW API, so external consumers could take advantage of the same system if the message strings were part of the data (we would like to have grammatical units in SQID as well).
just need one alternative label for most languages? Or are there
languages with more complex grammar rules for units?
Oh yes :) Russian is one, but I'm sure there are others.
Forgive my ignorance; I was not able to read the example you gave there.
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Stas, Thomas, John, Markus, Lydia and Wikidatans,
What happens when one develops structured Wiktionary ( https://www.wiktionary.org/), as linked open data for every part of every word and their sounds, perhaps as Qitems in Wikidata, in Wikipedia's 358 languages, and planning for all 7,943 language entries in Glottolog, and combine this with MediaWiki Content Translation ( https://www.mediawiki.org/wiki/Content_translation) - all re Wikidata/Wikibase? Qe this grammatical display of units thread, how to plan for the genders of words in some languages only? And is this/could this be partly a structured Wiktionary question, when one spells out the numerals/units in letters and words?
Scott
On Jul 29, 2016 6:55 AM, "John Erling Blad" jeblad@gmail.com wrote:
In general this has more implications than simple singular/plural forms of units. Agreement/concord/congruence is the proper term. [1] In some language you will even change the form given the distance to the thing you are measuring or counting, even depending on the type of thing you are measuring or counting, or change on the gender of the thing, and then even only for some numbers.
Assume you have "1 meter", then you could write it out as "én meter" in Norwegian as "meter" is masculinum. Now assume you have "1 kilogram", then you would write it out as "ett kilogram" as "gram" is neutrum. Now assume "kilogram" is changed to the short form "kilo", then it is "én kilo" which is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or "én milli" when refering to "1 milligram".
[1] https://en.wikipedia.org/wiki/Agreement_(linguistics)
On Fri, Jul 29, 2016 at 7:26 AM, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 28.07.2016 20:41, Stas Malyshev wrote:
Hi!
Good point. Could we not just have a monolingual text string property
that gives the preferred writing of the unit when used after a number? I don't think the plural/singular issue is very problematic, since you would have plural almost everywhere, even for "1.0 metres". So maybe we
We have code to deal with that - note that "1 reference" and "2 references" are displayed properly. It's a matter of applying that code and having it provided with proper configs.
You mean the MediaWiki message processing code? This would probably be powerful enough for units as well, but it works based on message strings that look a bit like MW template calls. Someone has to enter such strings for all units (and languages). This would be doable but the added power comes at the price of more difficult editing of such message strings instead of plain labels.
As far as I know, the message parsing is available through the MW API, so external consumers could take advantage of the same system if the message strings were part of the data (we would like to have grammatical units in SQID as well).
just need one alternative label for most languages? Or are there
languages with more complex grammar rules for units?
Oh yes :) Russian is one, but I'm sure there are others.
Forgive my ignorance; I was not able to read the example you gave there.
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi John, all
2016-07-29 15:54 GMT+02:00 John Erling Blad jeblad@gmail.com:
In general this has more implications than simple singular/plural forms of units. Agreement/concord/congruence is the proper term. In some language you will even change the form given the distance to the thing you are measuring or counting, even depending on the type of thing you are measuring or counting, or change on the gender of the thing, and then even only for some numbers.
Linguistic agreement is common in a lot of inflected languages [1].
Now assume "kilogram" is changed to the short form "kilo", then it is "én
kilo" which is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or "én milli" when refering to "1 milligram".
On the other hand, we don't have to deal with colloquialisms like "kilo" in your example. Modelling the formal language would be still hard enough.
Best, Jan
Norwegian have a lot of colloquialisms that must be handled if you want the language to sound natural. The example with "kilo" exists in a lot of languages in one form or another. Then you have congruence on external factors (direction, length, emptyness), missing plurals for some units (Norwegian mil is one example), …
On Sat, Jul 30, 2016 at 5:58 AM, Jan Macura macurajan@gmail.com wrote:
Hi John, all
2016-07-29 15:54 GMT+02:00 John Erling Blad jeblad@gmail.com:
In general this has more implications than simple singular/plural forms of units. Agreement/concord/congruence is the proper term. In some language you will even change the form given the distance to the thing you are measuring or counting, even depending on the type of thing you are measuring or counting, or change on the gender of the thing, and then even only for some numbers.
Linguistic agreement is common in a lot of inflected languages [1].
Now assume "kilogram" is changed to the short form "kilo", then it is "én
kilo" which is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or "én milli" when refering to "1 milligram".
On the other hand, we don't have to deal with colloquialisms like "kilo" in your example. Modelling the formal language would be still hard enough.
Best, Jan
[1] https://en.wikipedia.org/wiki/Fusional_language
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Wed, Jul 27, 2016 at 9:18 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1] http://english.stackexchange.com/questions/22082/are-units-in-english-singul... [2] https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%... [3] https://phabricator.wikimedia.org/T86528
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
Cheers Lydia
Hi!
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we
I'm not sure if this is a localization issue as such... even if we used only one language, we still would need to use a proper grammatical form (unless we'd chose a language that does not require any modification of unit label - but that's not the situation now :) But I think I'll add my comments as a subtask for this one.
do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
I also think it is a right thing to do, and enables us to do many things much easier, but I wanted to explore how to best address this particular issue, and solicit ideas.
How best to anticipate and plan here for ever more accurate translation between Wikipedia / Wikidata languages with full STEM precision? What's the road map? How might this "Unit Localization" Phabricator RFC https://phabricator.wikimedia.org/T86528 fit into a series of Phabricator RFCs in a longer term plan for great Wikidata translation? Can we further begin to lay out this "road map" at this stage for all of Wikipedia's 358 languages (and anticipate even all 7,943 language entries in Glottolog)?
Would it be possible to dovetail this with developing Wiktionary with Content Translation as Phabricator RFCs?
(WUaS which donated CC WUaS to CC Wikidata last autumn would like to help develop such translation and for CC MIT OCW in 7 languages and CC Yale OYC, for example, in addition to MediaWiki Content Translation).
Scott
On Jul 28, 2016 3:27 AM, "Lydia Pintscher" lydia.pintscher@wikimedia.de wrote:
On Wed, Jul 27, 2016 at 9:18 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1]
http://english.stackexchange.com/questions/22082/are-units-in-english-singul...
[2]
https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%...
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Lydia and Wikidatansl,
If "the basic problem is that we do use items for the units," after developing a workaround for this, could Wikidata make an item out of the parts and sounds of every word (in Wiktionary) in every (Wikipedia) language, to begin, as part of a further translator plan?
Scott
On Thu, Jul 28, 2016 at 1:55 PM, Info WorldUniversity < info@worlduniversityandschool.org> wrote:
How best to anticipate and plan here for ever more accurate translation between Wikipedia / Wikidata languages with full STEM precision? What's the road map? How might this "Unit Localization" Phabricator RFC https://phabricator.wikimedia.org/T86528 fit into a series of Phabricator RFCs in a longer term plan for great Wikidata translation? Can we further begin to lay out this "road map" at this stage for all of Wikipedia's 358 languages (and anticipate even all 7,943 language entries in Glottolog)?
Would it be possible to dovetail this with developing Wiktionary with Content Translation as Phabricator RFCs?
(WUaS which donated CC WUaS to CC Wikidata last autumn would like to help develop such translation and for CC MIT OCW in 7 languages and CC Yale OYC, for example, in addition to MediaWiki Content Translation).
Scott
On Jul 28, 2016 3:27 AM, "Lydia Pintscher" lydia.pintscher@wikimedia.de wrote:
On Wed, Jul 27, 2016 at 9:18 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Right now, quantities with units are displayed by attaching unit name to the number. While it gives the idea of what is going on, it is somewhat ungrammatical in English (83 kilgoramm, 185 centimetre, etc.) [1] and in other languages - i.e. in Russian it's 83 килограмм, 185 сантиметр - instead of the correct "83 килограмма", "185 сантиметров". For some units, the norms are kind of tricky and fluid (e.g. see [2]), and they are not even identical across all units in the same language, but the common theme is that there are grammatical rules on how to do it and we're ignoring them right now.
I think we do have some means to grammatically display numbers - for example, number of references is displayed correctly in English and Russian. As I understand, it is done by using certain formats in message strings, and these formats are supported in the code in Language classes. So, I wonder if we should maybe have an (optional) property that defines the same format for units? We could then reuse the same code to display units in proper grammatical way.
Alternatively, we could use short units display [3] - i.e. cm instead of centimetre - and then plurals are not required. However, this relies on units having short names, and for some units short names can be rather obscure, and maybe in some language short names need grammatical forms too. Given that we do not link unit names, it would be rather confusing (btw, why don't we?). Some units may not have short forms at all.
And the short names do not exactly match the languages - rather, they usually match the script (i.e. Cyrillic, or Latin, or Hebrew) - and we may not even have data on which language uses which script, in a useful form. So using short forms is very tricky.
Any other ideas on this topic? Do we have a ticket tracking this somewhere? I looked but couldn't find it.
[1]
http://english.stackexchange.com/questions/22082/are-units-in-english-singul...
[2]
https://ru.wikipedia.org/wiki/%D0%9E%D0%B1%D1%81%D1%83%D0%B6%D0%B4%D0%B5%D0%...
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 28.07.2016 um 12:26 schrieb Lydia Pintscher:
The discussion about how to do this is happening in https://phabricator.wikimedia.org/T86528 The basic problem is that we do use items for the units. I think this is the right thing to do but it does make this particular part a bit tricky.
Well, I think we could sidestep the grammar issue by using unit symbols. We would have to get them from statements, and they would have to be multilingual values (or mutliple mono-lingual values), but that is still much less complicated than trying to apply plural rules.
An alternative is to use MediaWiki i18n messages instead of entity labels. E.g. if the unit is Q11573, we could check if MediaWiki:wikibase-unit-Q11573 exists, and if it does, use it. We'd get internationalization including support for plurals for free.
We could actually combine all of these approaches: first check for a system message, then check for a symbol statement, then use the label, and if all fails, use the ID.
I'll comment on the ticket.
Hi!
Well, I think we could sidestep the grammar issue by using unit symbols. We
True, but what unit symbol is "apple"? It's actually used as measure of height (bonus points if you can guess on which item :). Even if we don't go this far, while SI units probably all have short names, for non-SI units, especially older and rarer ones, it may very well not be the case.
Another tricky part is that short names are not connected to languages right now. I.e. if your interface language is Serbian, which short name to use? What if it's Farsi? We'd need to change how we relate units & unit symbols then.
An alternative is to use MediaWiki i18n messages instead of entity labels. E.g. if the unit is Q11573, we could check if MediaWiki:wikibase-unit-Q11573 exists, and if it does, use it. We'd get internationalization including support for plurals for free.
That may work, but downside of this is that it is linked to unit ID - so if we wanted to use it for, say, Commons data, we'd have to somehow link between "metre" on Wikidata and "metre" on Commons.
Thanks a lot everyone for your input! That helped clarify our thinking. As a next step in that area we will concentrate on making use of the unit symbols where they are available. That should cover a very large percentage of cases. After that we'll tackle the rest as necessary.
Cheers Lydia