The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input.
I really like what I think is Option 3 (Change what it means to be a typed list), ignoring the technical implementation this looks really readable to me for a canonical view:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": [ "a", "b", "c", "d", "e" ]}
Looking at how one might interact with a typed list, once you access one of the items in the list, they would already know what the type is (in the above example, a Z6/String). To me, the value of a linked list is knowing at a high level what type a list contains, and to validate adding new data to that list.
I also really like that this includes the metadata for the type, so that it can be accessed and checked during runtime if needed.
On Fri, Apr 1, 2022 at 9:52 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
As Cory said, “we have to ask ourselves why we're not just directly using arrays everywhere in the first place.“ (Maybe that includes understanding that a regular Z6 is a list containing one string element, but then we have to ask ourselves why typed lists would have a generic type, if the element’s type is not generic.)
Assuming that the head-and-tail representation is preferred, is it not the case that the definition (value) of the Z1K1/type for every K2/tail will necessarily be identical, for a given element type? So could we make it a Z2? I’m not sure what type of Z2 (just a Z4?) but substituting its reference (“Zn”) into the canonical form, in place of the tail object’s type, gives:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a", "K2": { "Z1K1": "Zn", "K1": "b", "K2": { "Z1K1": "Zn", "K1": "c", "K2": { "Z1K1": "Zn", "K1": "d", "K2": { "Z1K1": "Zn", "K1": "e", "K2": { "Z1K1": "Zn" } } } } }}
That is the (manual) evaluation of a re-writing function. One could similarly define a function that re-writes an array into this form (or the more explicit canonical form, or the normalized form). Such a function (“Znn”) could give a representation like:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": {"Z1K1": "Z7", "Z7K1": "Znn", "ZnnK1": [ "a", "b", "c", "d", "e" ] } }
That doesn’t look quite right, but you get the idea; it’s rather similar to Option 3, but it avoids changing what it means to be a typed list.
Al.
On Sat, 2 Apr 2022 at 16:34, Lindsay Wardell three060@gmail.com wrote:
I really like what I think is Option 3 (Change what it means to be a typed list), ignoring the technical implementation this looks really readable to me for a canonical view:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": [ "a", "b", "c", "d", "e" ]}
Looking at how one might interact with a typed list, once you access one of the items in the list, they would already know what the type is (in the above example, a Z6/String). To me, the value of a linked list is knowing at a high level what type a list contains, and to validate adding new data to that list.
I also really like that this includes the metadata for the type, so that it can be accessed and checked during runtime if needed.
On Fri, Apr 1, 2022 at 9:52 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Thanks! Just to make sure I understand correctly, you literally mean to use "Zn" and "Znn" in order to indicate a form of syntactic sugar which shortcuts to the fully fledged generic type in the first head, right?
If I understood it right, then it seems to be like a neat combination of Option 3 and Option 5 (in Option 5 we completely drop the Z1K1 field, which is similar to your suggestion of using a literal Zn to denote "repeat the type from the previous head").
Your suggestion would remove a lot of bytes from being sent over the wire!
Thank you for thoughts, Denny
On Sun, Apr 3, 2022 at 4:32 AM Grounder UK grounderuk@gmail.com wrote:
As Cory said, “we have to ask ourselves why we're not just directly using arrays everywhere in the first place.“ (Maybe that includes understanding that a regular Z6 is a list containing one string element, but then we have to ask ourselves why typed lists would have a generic type, if the element’s type is not generic.)
Assuming that the head-and-tail representation is preferred, is it not the case that the definition (value) of the Z1K1/type for every K2/tail will necessarily be identical, for a given element type? So could we make it a Z2? I’m not sure what type of Z2 (just a Z4?) but substituting its reference (“Zn”) into the canonical form, in place of the tail object’s type, gives:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a", "K2": { "Z1K1": "Zn", "K1": "b", "K2": { "Z1K1": "Zn", "K1": "c", "K2": { "Z1K1": "Zn", "K1": "d", "K2": { "Z1K1": "Zn", "K1": "e", "K2": { "Z1K1": "Zn" } } } } }}
That is the (manual) evaluation of a re-writing function. One could similarly define a function that re-writes an array into this form (or the more explicit canonical form, or the normalized form). Such a function (“Znn”) could give a representation like:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": {"Z1K1": "Z7", "Z7K1": "Znn", "ZnnK1": [ "a", "b", "c", "d", "e" ] } }
That doesn’t look quite right, but you get the idea; it’s rather similar to Option 3, but it avoids changing what it means to be a typed list.
Al.
On Sat, 2 Apr 2022 at 16:34, Lindsay Wardell three060@gmail.com wrote:
I really like what I think is Option 3 (Change what it means to be a typed list), ignoring the technical implementation this looks really readable to me for a canonical view:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": [ "a", "b", "c", "d", "e" ]}
Looking at how one might interact with a typed list, once you access one of the items in the list, they would already know what the type is (in the above example, a Z6/String). To me, the value of a linked list is knowing at a high level what type a list contains, and to validate adding new data to that list.
I also really like that this includes the metadata for the type, so that it can be accessed and checked during runtime if needed.
On Fri, Apr 1, 2022 at 9:52 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
You’re welcome!
I meant "Znn" to be a new function. This is clarified (I hope) in https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Typed_lists#Generic_.... Its compact form there is:
{ "Z1K1": "Z7", "Z7K1": ⟨Zfff⟩, ⟨ZfffK1⟩: ⟨Zttt⟩, ⟨ZfffK2⟩: [ "a", "b", "c", "d", "e" ]}
(using ⟨Zfff⟩ in place of "Znn" and ⟨Zttt⟩to represent the element type).
"Zn", on the other hand, is a bit of a blunder, I’m afraid. It was meant to be a reference to a persistent (Z4) object but, as you suggest, it turns out to be a reference to the generic type, which is a transient object. Currently, the Z4K1/identity of a generic type is just the function call that evaluates to the type. And that is exactly what I replaced in the first place 🙄
Using persistent objects is not viable here because we would need a separate object for each type, including generic types.
Local references to transient objects are something we should probably look into. This was mentioned by GZWDer in https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Function_evaluator_c... but considered unnecessary for functions.
More generally, we can think of an object as a string and tokenize any repeating substring within it. We probably want a load of restrictions on this. For starters, I would suggest that a token can only substitute for a whole object. Since references already do this for persistent objects, this amounts to the same thing as a local reference to a transient object. We might end up with different syntax, depending on how we look at it, however. For example, suppose we invent an additional Z1K2/refid key, use of which implies substitution of the Z1K1 value:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" },
"Z1K2": "R1",
"K1": "a", "K2": { "Z1K1": "R1", "K1": "b", "K2": { "Z1K1": "R1", "K1": "c", "K2": { "Z1K1": "R1", "K1": "d", "K2": { "Z1K1": "R1", "K1": "e", "K2": { "Z1K1": "R1" } } } } }}
This is a bit of a fudge, because we are only substituting in the type, not the whole object. Arguably, it would be more consistent just to use the key “Z1K1” (rather than “R1”) to reference the associated value, but that takes some getting used to. I think that is then just Option 5, made explicit. The subtle difference, using “R1”, is that the type of each tail is inherited directly from the first head, rather than from the immediately prior head. That makes no difference here, since they must be the same. If we specifically want to inherit from the prior head, we could just put the refid into the type object itself. Then, the whole Z4 object, including the refid, would be substituted in.
I hope that clarifies my thinking a little. Just for the record, I actually prefer:
{ "Z1K1": { "Z1K1": "Z4", "Z4K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } }, "K1": { "Z1K1": "Z7", "Z7K1": ⟨Zfff⟩, ⟨ZfffK1⟩: "Z6", ⟨ZfffK2⟩: [ "a", "b", "c", "d", "e" ] }}
…for now… Al.
On Sat, 9 Apr 2022 at 01:55, Denny Vrandečić dvrandecic@wikimedia.org wrote:
Thanks! Just to make sure I understand correctly, you literally mean to use "Zn" and "Znn" in order to indicate a form of syntactic sugar which shortcuts to the fully fledged generic type in the first head, right?
If I understood it right, then it seems to be like a neat combination of Option 3 and Option 5 (in Option 5 we completely drop the Z1K1 field, which is similar to your suggestion of using a literal Zn to denote "repeat the type from the previous head").
Your suggestion would remove a lot of bytes from being sent over the wire!
Thank you for thoughts, Denny
On Sun, Apr 3, 2022 at 4:32 AM Grounder UK grounderuk@gmail.com wrote:
As Cory said, “we have to ask ourselves why we're not just directly using arrays everywhere in the first place.“ (Maybe that includes understanding that a regular Z6 is a list containing one string element, but then we have to ask ourselves why typed lists would have a generic type, if the element’s type is not generic.)
Assuming that the head-and-tail representation is preferred, is it not the case that the definition (value) of the Z1K1/type for every K2/tail will necessarily be identical, for a given element type? So could we make it a Z2? I’m not sure what type of Z2 (just a Z4?) but substituting its reference (“Zn”) into the canonical form, in place of the tail object’s type, gives:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a", "K2": { "Z1K1": "Zn", "K1": "b", "K2": { "Z1K1": "Zn", "K1": "c", "K2": { "Z1K1": "Zn", "K1": "d", "K2": { "Z1K1": "Zn", "K1": "e", "K2": { "Z1K1": "Zn" } } } } }}
That is the (manual) evaluation of a re-writing function. One could similarly define a function that re-writes an array into this form (or the more explicit canonical form, or the normalized form). Such a function (“Znn”) could give a representation like:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": {"Z1K1": "Z7", "Z7K1": "Znn", "ZnnK1": [ "a", "b", "c", "d", "e" ] } }
That doesn’t look quite right, but you get the idea; it’s rather similar to Option 3, but it avoids changing what it means to be a typed list.
Al.
On Sat, 2 Apr 2022 at 16:34, Lindsay Wardell three060@gmail.com wrote:
I really like what I think is Option 3 (Change what it means to be a typed list), ignoring the technical implementation this looks really readable to me for a canonical view:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": [ "a", "b", "c", "d", "e" ]}
Looking at how one might interact with a typed list, once you access one of the items in the list, they would already know what the type is (in the above example, a Z6/String). To me, the value of a linked list is knowing at a high level what type a list contains, and to validate adding new data to that list.
I also really like that this includes the metadata for the type, so that it can be accessed and checked during runtime if needed.
On Fri, Apr 1, 2022 at 9:52 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Thanks, Lindsay! I will relay your vote when we come to the decision making! I very much agree with your reasoning.
Thanks, Denny
On Sat, Apr 2, 2022 at 8:34 AM Lindsay Wardell three060@gmail.com wrote:
I really like what I think is Option 3 (Change what it means to be a typed list), ignoring the technical implementation this looks really readable to me for a canonical view:
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": [ "a", "b", "c", "d", "e" ]}
Looking at how one might interact with a typed list, once you access one of the items in the list, they would already know what the type is (in the above example, a Z6/String). To me, the value of a linked list is knowing at a high level what type a list contains, and to validate adding new data to that list.
I also really like that this includes the metadata for the type, so that it can be accessed and checked during runtime if needed.
On Fri, Apr 1, 2022 at 9:52 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version is available here (and might be much easier to read): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-04-01
--
Today’s newsletter will be focusing on a technical detail, and you should feel free to skip it if you don’t want to dive into the hairy details of our data model.
Over the last few weeks we have repeatedly hit a blocker, and have not yet resolved it, which is how to represent a typed list without the representation becoming very long. A list in Wikifunctions is a type that contains an unknown number of elements in a specific order.
In the Wikifunctions data model, a list is represented the same way as it is in the programming language Lisp https://meta.wikimedia.org/w/index.php?title=Wp:en:Lisp_(programming_language)&action=edit&redlink=1: a list consists of two parts, a “head”, which is the first element of the list, and a “tail”, which itself is a list and holds the rest of the list. In addition, there is the empty list, which has no head or tail and represents a list with zero elements. In Wikifunctions, the type for a list is Z10/List.
So a list with two elements "a" and "b" would, represented in our usual JSON notation, look like this:
{ "type": "List", "head": "a" "tail": { "type": "List", "head": "b", "tail": { "type": "List" } }}
{ "Z1K1": "Z10", "Z10K1": "a" "Z10K2": { "Z1K1": "Z10", "Z10K1": "b", "Z10K2": { "Z1K1": "Z10" } }}
As this gets a bit long, we have introduced a shortcut by using JSON’s array notation, and so the list can be represented as follows:
[ "a", "b" ]
The short notation (we call it canonicalized) can be easily and mechanically translated into the longer notation seen above, and the other way around.
In Phase η we introduced the possibility to have typed lists. Whereas in the normal list we don’t say anything about the type of the elements of the list, in a typed list we say “every element of the list must be of a certain type”. Having typed lists allows us to have stronger guarantees on typing: if you ask for an element of a typed list, you know what type the result will be. This can be helpful with type checking and with providing better interfaces.
A typed list is the result of a function call. We have a function, Z881/Typed list, that takes a type as its sole argument, and returns a type. The resulting type is a copy of Z10/List, with the difference that instead of the head having a value of any type, it must have a value of the type provided as the argument. And the tail is not of type Z10/List, but of the type returned by calling Z881/Type list with the given argument type.
Let’s say we want to represent the list we had before, but instead of it being a Z10/List it should be a list of strings. A list of strings is a type which is created by calling the function Z881/Typed list on the type Z6/String. This would look as follows in JSON notation.
{ "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "a" "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" }, "head": "b", "tail": { "type": { "type": "Function call", "function": "Typed list", "argument type": "String" } } }}
{ "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "K1": "a" "K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" }, "Z10K1": "b", "Z10K2": { "Z1K1": { "Z1K1": "Z7", "Z7K1": "Z881", "Z881K1": "Z6" } } }}
The challenge is that we cannot have the same short notation for this, as we would lose information.
We have tried to work around this by guessing the type: we look into the list, and if everything has the same type, we declare it a typed list. But first, it adds quite a bit of complexity to the code, and second it still does not solve all issues: the short form of the above list could indeed be a list of objects, or it could be a list of strings - there is no way for us to know.
We are publishing our internal discussion and a few options https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Typed_lists, and we would like to hear your thoughts and gather some wider input. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
abstract-wikipedia@lists.wikimedia.org