The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input.