Newsletter #66: Typed lists are looong - Abstract-Wikipedia

2 Apr 2022


      The on-wiki version is available here (and might be much easier to read):
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should
feel free to skip it if you don’t want to dive into the hairy details of
our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet
resolved it, which is how to represent a typed list without the
representation becoming very long. A list in Wikifunctions is a type that
contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it
is in the programming language Lisp
https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1:
a list consists of two parts, a “head”, which is the first element of the
list, and a “tail”, which itself is a list and holds the rest of the list.
In addition, there is the empty list, which has no head or tail and
represents a list with zero elements. In Wikifunctions, the type for a list
is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual
JSON notation, look like this:
{  "type": "List",  "head": "a"  "tail": {    "type": "List",
"head": "b",    "tail": {      "type": "List"    }  }}
{
  "Z1K1": "Z10",
  "Z10K1": "a"
  "Z10K2": {
    "Z1K1": "Z10",
    "Z10K1": "b",
    "Z10K2": {
      "Z1K1": "Z10"
    }
  }}
As this gets a bit long, we have introduced a shortcut by using JSON’s
array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and
mechanically translated into the longer notation seen above, and the other
way around.
In Phase η we introduced the possibility to have typed lists. Whereas in
the normal list we don’t say anything about the type of the elements of the
list, in a typed list we say “every element of the list must be of a
certain type”. Having typed lists allows us to have stronger guarantees on
typing: if you ask for an element of a typed list, you know what type the
result will be. This can be helpful with type checking and with providing
better interfaces.
A typed list is the result of a function call. We have a function,
Z881/Typed list, that takes a type as its sole argument, and returns a
type. The resulting type is a copy of Z10/List, with the difference that
instead of the head having a value of any type, it must have a value of the
type provided as the argument. And the tail is not of type Z10/List, but of
the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it
being a Z10/List it should be a list of strings. A list of strings is a
type which is created by calling the function Z881/Typed list on the type
Z6/String. This would look as follows in JSON notation.
{  "type": {    "type": "Function call",    "function": "Typed list",
  "argument type": "String"  },  "head": "a"  "tail": {    "type": {
   "type": "Function call",      "function": "Typed list",
"argument type": "String"    },    "head": "b",    "tail": {
"type": {        "type": "Function call",        "function": "Typed
list",        "argument type": "String"      }    }  }}
{
  "Z1K1": {
    "Z1K1": "Z7",
    "Z7K1": "Z881",
    "Z881K1": "Z6"
  },
  "K1": "a"
  "K2": {
    "Z1K1": {
      "Z1K1": "Z7",
      "Z7K1": "Z881",
      "Z881K1": "Z6"
    },
    "Z10K1": "b",
    "Z10K2": {
      "Z1K1": {
        "Z1K1": "Z7",
        "Z7K1": "Z881",
        "Z881K1": "Z6"
      }
    }
  }}
The challenge is that we cannot have the same short notation for this, as
we would lose information.
We have tried to work around this by guessing the type: we look into the
list, and if everything has the same type, we declare it a typed list. But
first, it adds quite a bit of complexity to the code, and second it still
does not solve all issues: the short form of the above list could indeed be
a list of objects, or it could be a list of strings - there is no way for
us to know.
We are publishing our internal discussion and a few options
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we
would like to hear your thoughts and gather some wider input.