Hi,
This is an interesting discussion and I share here some of my personal experiences.
As part of a personal project on creating a multilingual programming language (WIP) [1, 2], I worked on exploring ways where we do not assume that the numbers are represented only as a sequence of Arabic numerals, like 4657388. As discussed in this thread, there are numerous other representation systems. The Roman numeral system, for example, may not represent very large numbers, but can be found in literature. I used Unicode and Roman numerals to represent numbers with the possibility to support mathematical operations. In ideal situations, a user of a multilingual programming language can use different numerical systems for performing mathematical calculations and the result must be displayed in the same numerical system.
For example, using Roman numerals [3] num1 = rn.RomanNumeral("XV") # create a numeral num2 = rn.RomanNumeral("VII") # create a numeral num3 = num1 * num2
or using numerals in Malayalam language [4] num1 = un.UnicodeNumeral("൧൩") # create a numeral num2 = un.UnicodeNumeral("൨൪") # create a numeral num3 = num1 + num2
However, we cannot assume a general way of representing numbers in different languages. I did not focus on handling cases where spaces or commas are present in a number, like in currencies 4 657 388 or 4,657,388. That would require more advanced use of existing internationalization/localization efforts. We already have some support for currencies for many locales.
Thus, for Wikifunctions, we may need to imagine such complex, but interesting examples.
References: [1] https://github.com/johnsamuelwrites/multilingual [2] Multilingual Programming Experience: Envisioning an Inclusive and Diverse Future https://medium.com/@jsamwrites/multilingual-programming-experience-envisioning-an-inclusive-and-diverse-future-9f04fde3ff39 [3] https://github.com/johnsamuelwrites/multilingual/blob/main/tests/roman_numer...
[4] https://github.com/johnsamuelwrites/multilingual/blob/main/tests/unicode_num...
On Thu, Sep 21, 2023 at 2:00 PM < abstract-wikipedia-request@lists.wikimedia.org> wrote:
Send Abstract-Wikipedia mailing list submissions to abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe, please visit
https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
You can reach the person managing the list at abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
- Newsletter #127: Renderer and parsers for types (Denny Vrandečić)
- Re: Newsletter #127: Renderer and parsers for types (Thad Guidry)
Message: 1 Date: Wed, 20 Sep 2023 17:13:42 -0700 From: Denny Vrandečić dvrandecic@wikimedia.org Subject: [Abstract-wikipedia] Newsletter #127: Renderer and parsers for types To: Abstract Wikipedia list abstract-wikipedia@lists.wikimedia.org Message-ID: <CA+bik1fVXXHePHKdv8W5FXGm=_hRCFNL+= fUO9YZq5+mtvfoOQ@mail.gmail.com> Content-Type: multipart/alternative; boundary="00000000000008d7f50605d35eed"
The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20 -- Renderers and parsers for types
Wikifunctions currently supports two types: Strings and Booleans. To make Wikifunctions useful, we need to support many more types, such as numbers, dates, geocoordinates, and eventually Wikidata lexemes and items. Types define what kind of inputs and outputs the functions in Wikifunctions can have.
With Wikifunctions, we don’t want to just repeat what different programming languages have done, but, if possible, gently update the lessons that have been learned from programming language research and experience and make sure that we are as inclusive as possible.
Strings and Booleans were very carefully chosen for the first deployment of Wikifunctions: Strings https://www.wikifunctions.org/wiki/Z6, because they are just a specific sequence of Characters, and do not depend on the user’s language. Booleans https://www.wikifunctions.org/wiki/Z40, because they are a key basis of logic flow for programming. Further, they can be fully translated in Wikifunctions – the two values, True https://www.wikifunctions.org/wiki/Z41 and False https://www.wikifunctions.org/wiki/Z42, are both represented by a Wikifunctions object that can have names in any of the languages we support. Since the initial deployment, more than a dozen translations have been added! If you can add more, that would be great.
One example of a possible next type that would be interesting to introduce would be whole numbers. This raises a big question: how should we represent an integer?
Most programming languages have two answers for that: one, they internally represent it, usually, as a binary string of a specific length, in order to efficiently store and process these numbers. But then there is also their representation in the human-readable source code, and here they are usually represented as a sequence of Arabic numerals https://en.wikipedia.org/wiki/Arabic_numerals, e.g. 4657388. Some programming languages are nice enough to allow for grouping of the numbers, e.g. in Ada https://en.wikipedia.org/wiki/Ada_(programming_language) you may write 4_657_388, or, if you prefer the Indian system https://en.wikipedia.org/wiki/Indian_numbering_system, 46_57_388, making these numbers a bit more readable.
But programming languages where one can write ৪৬,৫৭,৩৮৮ using Bengali numerals https://en.wikipedia.org/wiki/Bengali_numerals, referring to the same number, are rare https://sjishan.github.io/chascript/. For Wikifunctions, we want to rectify this, to make sure that the whole system supports every human language fluently and consistently.
Internally, we will represent numbers - like every other object - as ZObjects. The above number would be represented internally as follows (using the prototype ZID from the Beta https://wikifunctions.beta.wmflabs.org/view/en/Z10015, since we don’t yet have the respective type in the real Wikifunctions):
{ "Z1K1": "Z10015", "Z10015K1": "4657388"}
Or, with labels in English:
{ "type": "positive integer", "value": "4657388"}
Even though this solves the internal representation, we would want to avoid displaying this object in the system if possible. Instead, we plan to allow the Wikifunctions community to attach a 'renderer' and a 'parser' to each type. The renderer would be a function that takes an object of the given type (in this case, an object of the type positive integer) and a language, and returns a string. The parser is the opposite of that: it takes a string and a language, and returns an object of type positive integer.
This would allow the Wikifunctions community to create functions for each type and language that would decide how the values of the type are going to be displayed in the given language. In a Bengali interface, the above number can then be displayed in the most natural representation for Bengali, which might be ৪৬,৫৭,৩৮৮.
When entering a number, we will use the parsing function to turn the input of the user into the internal representation. It is then up to the community to decide how flexible they want to be: if they would only accept ৪৬,৫৭,৩৮৮ as the input, or whether ৪৬৫৭৩৮৮ would be just as good - or even also or only 4657388. The decision would be for the Wikifunctions community to make.
Note that we made a lot of assumptions in the above text. For example, using the ZID from the Beta, calling the type “positive integer”, assuming the internal representation of positive integers being Arabic numerals without formatting (instead of say, hexadecimal, base 64 or a binary number, which also could be good solutions), and other assumptions. All of these decisions are up to you, but we used assumptions here to talk concretely about the proposal.
We plan to implement this proposal incrementally, over a few weeks and months. It will likely be the case that we will at first only accept the internal representation (just as it currently works on the Beta), and that we will then add renderers and finally parsers.
We are looking forward to hearing your feedback on this plan.