The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-27
--
Serializers and deserializers for types
Last week, we discussed our plans to add renderers and parsers for types
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20>.
This week, we will continue the theme of how to make types easier to use,
by discussing serializers and deserializers, and their role in
Wikifunctions.
If you have the appropriate type, writing a native code function can be
really easy: for example, since we already have types for Booleans
<https://www.wikifunctions.org/wiki/Z40> and Strings
<https://www.wikifunctions.org/wiki/Z6>, and we translate them in the
system to the native concepts of Booleans and Strings in Python and
JavaScript, this means that writing the code implementation for a function
such as the boolean conjunction (and)
<https://www.wikifunctions.org/view/en/Z10174> or joining strings
<https://www.wikifunctions.org/wiki/Z10000> is rather straightforward and
just a single line of code packaged in a function:
- And in Python <https://www.wikifunctions.org/view/en/Z10175>
- And in JavaScript <https://www.wikifunctions.org/view/en/Z10202>
- Join strings in Python <https://www.wikifunctions.org/view/en/Z10004>
- Join strings in JavaScript
<https://www.wikifunctions.org/view/en/Z10005> (and alternatives using
concat <https://www.wikifunctions.org/view/en/Z10621> or join
<https://www.wikifunctions.org/view/en/Z10622>)
On Wikifunctions Beta, we already have seen the creation of a few types,
such as for numbers <https://wikifunctions.beta.wmflabs.org/view/en/Z10015>
or dates <https://wikifunctions.beta.wmflabs.org/view/en/Z10438>. But the
implementations for similarly basic functions such as addition
<https://wikifunctions.beta.wmflabs.org/view/en/Z10118> or squaring a
number are nowhere as simple, and have far more than a single line of code:
- Addition in Python
<https://wikifunctions.beta.wmflabs.org/view/en/Z10874>
- Addition in JavaScript
<https://wikifunctions.beta.wmflabs.org/view/en/Z10119>
Why is that so?
Here’s the implementation of the addition function in Python in the Beta
Cluster version:
def Z10118(Z10118K1, Z10118K2): def deserialize(x): return
int(x.Z10015K1) def serialize(x): return ZObject({"Z1K1":"Z9",
"Z9K1":"Z10015"}, Z10015K1=str(x)) left = deserialize(Z10118K1)
right = deserialize(Z10118K2) result = left + right return
serialize(result)
And here’s how the implementation should look:
def Z10118(Z10118K1, Z10118K2): return Z10118K1 + Z10118K2
In the core of the implementation above, that’s exactly what it does: in
line 11 you can see that the Python + operator is being called. But in
addition to all that, we also need code that deserializes the input
arguments, and serializes the output. In other words, we need to turn the
ZObject that Wikifunctions works with into values of Python’s int type
(that happens in line 3) and back into a ZObject (that happens in line 6).
If Wikifunctions knew that the positive integer type can be fully
represented by the int type of Python 3, we could have automatically made
that conversion inside the system. But we want types to be flexible, and to
eventually be fully community-controlled on-wiki. And that also means that
we shouldn’t build in any magic into the Wikifunctions system that does
such conversions, or that requires the system to know types.
The way we plan to tackle this is as follows (and now is the right time for
comments):
We will introduce two new types of special objects: serializers and
deserializers. A deserializer is attached to a specific programming
language and Wikifunctions source type, and has code attached that takes a
ZObject of the source type and turn it into a value of the target native
type in that programming language. A serializer is the inverse of that.
For example, you might have a deserializer that turns a Wikifunctions
Integer type when used with Python into a native BigNum (even if it might
fit into an int), and the serializer from Python understands how to convert
both native Python ints and BigNums to Wikifunctions Integer type instances.
Now, whenever we want to run native code, the evaluator - the piece of code
responsible for running native code - will also need to run the code
associated with the serializers and deserializers. That is, all the extra
code that makes up the difference between the two implementations above
would be handled automatically by Wikifunctions.
For each type and language, there would be exactly one deserializer and
serializer. Then, when a native implementation for a function is being
written, we look up the types on the function, and find the right
serializer and deserializer for those types in that programming language.
Let us know if you have ideas or comments on these plans!
October Volunteer Corner
The volunteer corner for next month will be next week Monday, October 2nd.
We are playing a bit with the times, so that different people may attend.
Also, because of repeated issues with Jit.si, we are shifting for now to
Google Meet again.
Please give us feedback on the time and on the platform, so we can continue
to improve.
We are meeting on October 2nd, 2023, at 13:30 UTC
<https://zonestamp.toolforge.org/1696253400> at Google Meet
<https://meet.google.com/xuy-njxh-rkw?authuser=0&hs=122>.
The agenda for the meeting is to take any questions that arise, followed by
working on a function together.
Recording of September Volunteer Corner
We also uploaded a recording of the September edition of the Volunteer
Corner
<https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
to
Commons. We were working together on a function to check if a string is a
valid positive integer <https://www.wikifunctions.org/view/en/Z11129>. It
was great fun to build a function on Wikifunctions together, to have folks
create testers, and to discuss the function and its limits live!
Hi,
This is an interesting discussion and I share here some of my personal
experiences.
As part of a personal project on creating a multilingual programming
language (WIP) [1, 2], I worked on exploring ways where we do not assume
that the numbers are represented only as a sequence of Arabic numerals,
like 4657388. As discussed in this thread, there are numerous other
representation systems. The Roman numeral system, for example, may not
represent very large numbers, but can be found in literature. I used
Unicode and Roman numerals to represent numbers with the possibility to
support mathematical operations. In ideal situations, a user of a
multilingual programming language can use different numerical systems for
performing mathematical calculations and the result must be displayed in
the same numerical system.
For example, using Roman numerals [3]
num1 = rn.RomanNumeral("XV") # create a numeral
num2 = rn.RomanNumeral("VII") # create a numeral
num3 = num1 * num2
or using numerals in Malayalam language [4]
num1 = un.UnicodeNumeral("൧൩") # create a numeral
num2 = un.UnicodeNumeral("൨൪") # create a numeral
num3 = num1 + num2
However, we cannot assume a general way of representing numbers in
different languages. I did not focus on handling cases where spaces or
commas are present in a number, like in currencies 4 657 388 or 4,657,388.
That would require more advanced use of existing
internationalization/localization efforts. We already have some support for
currencies for many locales.
Thus, for Wikifunctions, we may need to imagine such complex, but
interesting examples.
References:
[1] https://github.com/johnsamuelwrites/multilingual
[2] Multilingual Programming Experience: Envisioning an Inclusive and
Diverse Future
<https://medium.com/@jsamwrites/multilingual-programming-experience-envision…>
[3] https://github.com/johnsamuelwrites/multilingual/blob/main/tests/roman_nume…
[4]
https://github.com/johnsamuelwrites/multilingual/blob/main/tests/unicode_nu…
On Thu, Sep 21, 2023 at 2:00 PM <
abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe, please visit
>
> https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
> Today's Topics:
>
> 1. Newsletter #127: Renderer and parsers for types (Denny Vrandečić)
> 2. Re: Newsletter #127: Renderer and parsers for types (Thad Guidry)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 20 Sep 2023 17:13:42 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> Subject: [Abstract-wikipedia] Newsletter #127: Renderer and parsers
> for types
> To: Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> Message-ID:
> <CA+bik1fVXXHePHKdv8W5FXGm=_hRCFNL+=
> fUO9YZq5+mtvfoOQ(a)mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="00000000000008d7f50605d35eed"
>
> The on-wiki version of this newsletter can be found here:
> https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20
> --
> Renderers and parsers for types
>
> Wikifunctions currently supports two types: Strings and Booleans. To make
> Wikifunctions useful, we need to support many more types, such as numbers,
> dates, geocoordinates, and eventually Wikidata lexemes and items. Types
> define what kind of inputs and outputs the functions in Wikifunctions can
> have.
>
> With Wikifunctions, we don’t want to just repeat what different programming
> languages have done, but, if possible, gently update the lessons that have
> been learned from programming language research and experience and make
> sure that we are as inclusive as possible.
>
> Strings and Booleans were very carefully chosen for the first deployment of
> Wikifunctions: Strings <https://www.wikifunctions.org/wiki/Z6>, because
> they are just a specific sequence of Characters, and do not depend on the
> user’s language. Booleans <https://www.wikifunctions.org/wiki/Z40>,
> because
> they are a key basis of logic flow for programming. Further, they can be
> fully translated in Wikifunctions – the two values, True
> <https://www.wikifunctions.org/wiki/Z41> and False
> <https://www.wikifunctions.org/wiki/Z42>, are both represented by a
> Wikifunctions object that can have names in any of the languages we
> support. Since the initial deployment, more than a dozen translations have
> been added! If you can add more, that would be great.
>
> One example of a possible next type that would be interesting to introduce
> would be whole numbers. This raises a big question: how should we represent
> an integer?
>
> Most programming languages have two answers for that: one, they internally
> represent it, usually, as a binary string of a specific length, in order to
> efficiently store and process these numbers. But then there is also their
> representation in the human-readable source code, and here they are usually
> represented as a sequence of Arabic numerals
> <https://en.wikipedia.org/wiki/Arabic_numerals>, e.g. 4657388. Some
> programming languages are nice enough to allow for grouping of the numbers,
> e.g. in Ada <https://en.wikipedia.org/wiki/Ada_(programming_language)> you
> may write 4_657_388, or, if you prefer the Indian system
> <https://en.wikipedia.org/wiki/Indian_numbering_system>, 46_57_388, making
> these numbers a bit more readable.
>
> But programming languages where one can write ৪৬,৫৭,৩৮৮ using Bengali
> numerals <https://en.wikipedia.org/wiki/Bengali_numerals>, referring to
> the
> same number, are rare <https://sjishan.github.io/chascript/>. For
> Wikifunctions, we want to rectify this, to make sure that the whole system
> supports every human language fluently and consistently.
>
> Internally, we will represent numbers - like every other object - as
> ZObjects. The above number would be represented internally as follows
> (using the prototype ZID from the Beta
> <https://wikifunctions.beta.wmflabs.org/view/en/Z10015>, since we don’t
> yet
> have the respective type in the real Wikifunctions):
>
> { "Z1K1": "Z10015", "Z10015K1": "4657388"}
>
> Or, with labels in English:
>
> { "type": "positive integer", "value": "4657388"}
>
> Even though this solves the internal representation, we would want to avoid
> displaying this object in the system if possible. Instead, we plan to allow
> the Wikifunctions community to attach a 'renderer' and a 'parser' to each
> type. The renderer would be a function that takes an object of the given
> type (in this case, an object of the type positive integer) and a language,
> and returns a string. The parser is the opposite of that: it takes a string
> and a language, and returns an object of type positive integer.
>
> This would allow the Wikifunctions community to create functions for each
> type and language that would decide how the values of the type are going to
> be displayed in the given language. In a Bengali interface, the above
> number can then be displayed in the most natural representation for
> Bengali, which might be ৪৬,৫৭,৩৮৮.
>
> When entering a number, we will use the parsing function to turn the input
> of the user into the internal representation. It is then up to the
> community to decide how flexible they want to be: if they would only accept
> ৪৬,৫৭,৩৮৮ as the input, or whether ৪৬৫৭৩৮৮ would be just as good - or even
> also or only 4657388. The decision would be for the Wikifunctions community
> to make.
>
> Note that we made a lot of assumptions in the above text. For example,
> using the ZID from the Beta, calling the type “positive integer”, assuming
> the internal representation of positive integers being Arabic numerals
> without formatting (instead of say, hexadecimal, base 64 or a binary
> number, which also could be good solutions), and other assumptions. All of
> these decisions are up to you, but we used assumptions here to talk
> concretely about the proposal.
>
> We plan to implement this proposal incrementally, over a few weeks and
> months. It will likely be the case that we will at first only accept the
> internal representation (just as it currently works on the Beta), and that
> we will then add renderers and finally parsers.
>
> We are looking forward to hearing your feedback on this plan.
>
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20
--
Renderers and parsers for types
Wikifunctions currently supports two types: Strings and Booleans. To make
Wikifunctions useful, we need to support many more types, such as numbers,
dates, geocoordinates, and eventually Wikidata lexemes and items. Types
define what kind of inputs and outputs the functions in Wikifunctions can
have.
With Wikifunctions, we don’t want to just repeat what different programming
languages have done, but, if possible, gently update the lessons that have
been learned from programming language research and experience and make
sure that we are as inclusive as possible.
Strings and Booleans were very carefully chosen for the first deployment of
Wikifunctions: Strings <https://www.wikifunctions.org/wiki/Z6>, because
they are just a specific sequence of Characters, and do not depend on the
user’s language. Booleans <https://www.wikifunctions.org/wiki/Z40>, because
they are a key basis of logic flow for programming. Further, they can be
fully translated in Wikifunctions – the two values, True
<https://www.wikifunctions.org/wiki/Z41> and False
<https://www.wikifunctions.org/wiki/Z42>, are both represented by a
Wikifunctions object that can have names in any of the languages we
support. Since the initial deployment, more than a dozen translations have
been added! If you can add more, that would be great.
One example of a possible next type that would be interesting to introduce
would be whole numbers. This raises a big question: how should we represent
an integer?
Most programming languages have two answers for that: one, they internally
represent it, usually, as a binary string of a specific length, in order to
efficiently store and process these numbers. But then there is also their
representation in the human-readable source code, and here they are usually
represented as a sequence of Arabic numerals
<https://en.wikipedia.org/wiki/Arabic_numerals>, e.g. 4657388. Some
programming languages are nice enough to allow for grouping of the numbers,
e.g. in Ada <https://en.wikipedia.org/wiki/Ada_(programming_language)> you
may write 4_657_388, or, if you prefer the Indian system
<https://en.wikipedia.org/wiki/Indian_numbering_system>, 46_57_388, making
these numbers a bit more readable.
But programming languages where one can write ৪৬,৫৭,৩৮৮ using Bengali
numerals <https://en.wikipedia.org/wiki/Bengali_numerals>, referring to the
same number, are rare <https://sjishan.github.io/chascript/>. For
Wikifunctions, we want to rectify this, to make sure that the whole system
supports every human language fluently and consistently.
Internally, we will represent numbers - like every other object - as
ZObjects. The above number would be represented internally as follows
(using the prototype ZID from the Beta
<https://wikifunctions.beta.wmflabs.org/view/en/Z10015>, since we don’t yet
have the respective type in the real Wikifunctions):
{ "Z1K1": "Z10015", "Z10015K1": "4657388"}
Or, with labels in English:
{ "type": "positive integer", "value": "4657388"}
Even though this solves the internal representation, we would want to avoid
displaying this object in the system if possible. Instead, we plan to allow
the Wikifunctions community to attach a 'renderer' and a 'parser' to each
type. The renderer would be a function that takes an object of the given
type (in this case, an object of the type positive integer) and a language,
and returns a string. The parser is the opposite of that: it takes a string
and a language, and returns an object of type positive integer.
This would allow the Wikifunctions community to create functions for each
type and language that would decide how the values of the type are going to
be displayed in the given language. In a Bengali interface, the above
number can then be displayed in the most natural representation for
Bengali, which might be ৪৬,৫৭,৩৮৮.
When entering a number, we will use the parsing function to turn the input
of the user into the internal representation. It is then up to the
community to decide how flexible they want to be: if they would only accept
৪৬,৫৭,৩৮৮ as the input, or whether ৪৬৫৭৩৮৮ would be just as good - or even
also or only 4657388. The decision would be for the Wikifunctions community
to make.
Note that we made a lot of assumptions in the above text. For example,
using the ZID from the Beta, calling the type “positive integer”, assuming
the internal representation of positive integers being Arabic numerals
without formatting (instead of say, hexadecimal, base 64 or a binary
number, which also could be good solutions), and other assumptions. All of
these decisions are up to you, but we used assumptions here to talk
concretely about the proposal.
We plan to implement this proposal incrementally, over a few weeks and
months. It will likely be the case that we will at first only accept the
internal representation (just as it currently works on the Beta), and that
we will then add renderers and finally parsers.
We are looking forward to hearing your feedback on this plan.
This edition of the newsletter can be found on-wiki here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-08
--
Let’s start building morphological paradigms!
Morphological functions are functions that take one form of a word and
create a different form of the word that is required in order to follow
grammatical and semantic rules. For example, it might take a word such as
“letter” and return the plural form “letters”, or the present participle
“lettering”. We talked previously about morphological paradigms
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-10>,
about the need for both lexemes in Wikidata and paradigms in Wikifunctions
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-17>,
and about a tool to check lexemes and paradigms against each other
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-11-09>.
We think that Wikifunctions is now ready for us to start building functions
that support morphological paradigms! We would like to invite the different
language communities to join us at Wikifunctions and to start creating
functions that create lexical forms.
Here are a few examples of such functions from the Wikifunctions Beta:
-
English plural <https://wikifunctions.beta.wmflabs.org/view/en/Z10241>
-
French plural <https://wikifunctions.beta.wmflabs.org/view/en/Z10106>
-
Swedish genitive <https://wikifunctions.beta.wmflabs.org/view/en/Z10297>
-
German feminine plural
<https://wikifunctions.beta.wmflabs.org/view/en/Z10358>
-
Croatian feminine plural nominative
<https://wikifunctions.beta.wmflabs.org/view/en/Z10335>
In most cases, these functions would be taking a string and returning a
string (but there might be exceptions). The examples above illustrate two
kinds of case: functions which aim to work for all words of a given part of
speech (e.g. the English and French plural and the Swedish genitive, which
aims to work for any noun), and those that work for a subset of a given
part of speech (the examples cover German and Croatian feminine nouns, but
could be more complex, such as Zaliznyak’s classification of Russian nouns
<https://en.wikipedia.org/wiki/Andrey_Zaliznyak> or maybe also noun classes
in Niger-Congo languages).
The morphological paradigms will often not be complete inside
Wikifunctions. The idea is for exceptions to be caught by adding as forms
to the lexicographical knowledge in Wikidata. The functions are for the
reasonably regular forms and for the often generative rules agglutinative
languages (but what is a regular form and what is an exception is
something the given language community will need to figure out). We are
looking forward to communities to come together around the individual
languages and for morphologizers, sets of morphological paradigms for a
given language, to grow in Wikifunctions! We are particularly excited to
see how far these can be built in a given natural language, and how
autonomous the languages can become (and where they still depend on
understanding English) - and how much these functions will benefit across
languages. We also hope that capturing the paradigms can be a contribution
to language preservation and revitalization.
We are aware that the situation is not yet ideal for this call to action,
but we want to make this call sooner rather than later. We are actively
working on improving the situation. If you want to contribute to this
challenge, but don’t yet have functioneer rights, please apply here and
explain which language you plan to work on
<https://www.wikifunctions.org/wiki/Wikifunctions:Apply_for_editing>. We
will prioritize such applications. We are also aware that the current
limitation on function calls to logged-in users makes it difficult to demo
the functions and prevents it from integrating it into tools such as Form
Checker <https://williamavery.github.io/formcheck/> or Lucas Werkmeister’s
Wikidata Lexeme Forms <https://lexeme-forms.toolforge.org/>. We are
prioritizing resolving these issues.
Please reach out to us if you need help starting this effort for the
language you are interested in, or if you hit any stumbling blocks on the
way. We are looking forward to supporting you.
A few days ago, we started a conversation on the Wikifunctions Project Chat
about what the right type for morphological functions
<https://www.wikifunctions.org/wiki/Wikifunctions:Project_chat#Type_for_morp…>
should be, which also extended to IRC / Telegram
<https://wm-bot.wmcloud.org/logs/%23wikipedia-abstract/20230830.txt>. My
summary of the comments is that we can go ahead with using strings as the
argument and output type for morphological paradigms, given that in most
cases a string-based function would be at the core of any other solution
anyway. Thanks to everyone who participated in that discussion!
Thanks to Kutz Arrieta for a discussion and review of this newsletter.
Thanks to William Avery for improving and working on the Form Checker tool.
Thanks to everyone who participated in the discussion about the right type
for morphological functions.
Volunteer’s Corner
The next Volunteer’s Corner will be on September 18, 2023, at 17:30 UTC
<https://zonestamp.toolforge.org/1695058220> at
meet.jit.si/AWVolunteersCorner. We plan to kick off that meeting with a
live tutorial on building functions together.