I'm replying to Denny's original message. I've read other replies and James's phabricator overview and I think I understand the problem. Except I don't. So I'm stepping back to the requirements and constraints.
Constraint 1 is that text entered by humans must be stored as human-readable text (encoded, obviously).
So, labels and aliases and "source code" are primarily text. If you need to store the text as JSON, that's fine, but I was imagining that the human text, as entered, would be translated into an abstract form (with Zs and Ks etc) and it is the abstract form that gets stored as JSON. Yes, it's dynamic (on-the-fly) during the editing process, but the human enters text we care about and the machine turns it into an object we care about. The text entered is a text and the interpreted text (object) is another text.
Translations are translations of a source. You can translate the text entered into another language, and that's another text. Or you can translate the derived form, and that's a different text. But if the translation is fully automated, you might treat it as "mere" presentation: a visualisation of underlying data. Maybe it makes sense to store such a thing, especially if a human has seen it, even more so if they have relied on it, but do we have a Requirement to store all translations up-front? I don't think so. We store it in the language it was entered in (preferably with metadata that identifies the language) and maybe we store it in a small number of different languages (always having at least two would be nice). Beyond that, I think you're talking about sub-pages per language (but let's not jump to solutions).
Constraint 2 is that text is bound to its context.
A good example of this is comments in source-code. The word "in" indicates the binding. The comment doesn't point to some text, it is the text, right there, "in" the source code. Constraint 1 ("C1") applies: it is stored as entered, where entered. If you later want to tidy up the source code and replace comments with pointers to comments and/or translations, that's fine. But C1 still applies, so you have a new version of the source-code and you still have the old version.
When it comes to documentation outside of comments, that's just a text (as written). It might be written as a multi-lingual text, but more than bi-lingual is stretching it a bit for most of us. The bi-lingual text may be a collaboration with a machine translator, but I would only see that as a Requirement when one of the languages is WMF's own synthetic language (ZKspeak, to coin a phrase). That is, I might compose my text in DeepL and paste its English into the ZObject documentation. For us, that is the text as written (C1 applies). If I also paste in the text I composed in a different language, that's fine; that's another text as written (C1 applies). (If I make a comment to that effect, that's just text where entered, but it's interesting metadata, so there may be a Requirement to capture the metadata. Either way (or both ways) C1 applies.)
Requirement 1 might be that any text can be entered as Wikitext.
Ah, but the JSON can't be Wikitext... Well, that isn't the Requirement. We can enter the text as Wikitext (so C1 applies). If it must be translated into text that can be JSON, that's fine. We still have the Wikitext and now we also have a translation; that's another text.
Does the above guide us toward a Solution? Well, it's not A, because we don't have many translations in the JSON blob. But maybe we have three: the source human text, the interpreted ZK text and a translation into a second language.
It's not B (but I don't understand B). I think we do have "secondary wikitext" but it might be implemented as "primary wikitext" with secondary translations as sub-pages (somewhat optionally), as in Meta. It would be the JSON blob that would be secondary (in a logical sence): some transformation of a primary text. If the JSON needs to be primary, you can treat it that way; then its human source pretends to be "about" the primary object.
It's a bit like C, but it's not a big blob and it's probably not parallel. Maybe it's a primary Meta-like wiki that is linked by common reference (ZID) to the JSON blobosphere. Well, that sounds a lot like D, but...
It's not D, because the Meta-like wiki page for a ZObject is not a sub-page of a non-wiki page. Wikipedia pages are not sub-pages of their Wikidata Item's page, but you can look at them as if they are. We can link from one Wikipedia to another directly, or we can link through Wikidata. I know callable functions are a bit different but, as I said at the beginning, I don't understand the problem. Hopefully this input will still be of benefit to somebody who does, however.
Best regards, Al.
On Wednesday, 29 July 2020, abstract-wikipedia-request@lists.wikimedia.org wrote:
Send Abstract-Wikipedia mailing list submissions to abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia or, via email, send a message with subject or body 'help' to abstract-wikipedia-request@lists.wikimedia.org
You can reach the person managing the list at abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
- Re: Two different kinds of information? (Andy) (Denny Vrandečić)
- How to store wikitext along the structured content? (Denny Vrandečić)
- Re: Two different kinds of information? (Denny Vrandečić)
Message: 1 Date: Tue, 28 Jul 2020 13:57:52 -0700 From: Denny Vrandečić dvrandecic@wikimedia.org To: "General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists. wikimedia.org> Subject: Re: [Abstract-wikipedia] Two different kinds of information? (Andy) Message-ID: <CA+bik1eS56HuAqtd6O-4OS-kexUzfvu0u4hsYfXtxc83Fms42w@ mail.gmail.com> Content-Type: text/plain; charset="utf-8"
Hi Al,
just one quick request - can you set up your answers to the mailing list in such a way that it doesn't break the thread? (I am not sure how, maybe someone else can chime in, but right now, your answers start a new thread instead of continuing the previous one).
Louis Lecaillez had a similar issue initially, but managed to resolve it, for which I am thankful.
If not, it is OK, but I thought I'd ask.
Thank you! Denny
On Tue, Jul 28, 2020 at 8:23 AM Grounder UK grounderuk@gmail.com wrote:
Hi, Andy! Welcome! I do like your idea of being clear about basic "facts" and details. I think it will be key in the selection of "statements" that go into an "article", in whatever language is required. I don't think we can say how many levels of information there might be, but we can already see
something
from how Wikipedia pages are put into categories.
"France is a country in Europe" and "in western Europe" and "in the European Union", just to mention three categories. The first is an important fact of geography, but is the second more helpful? All
countries
in western Europe are (1) a country and (2) in Europe and (3) to the
west.
(3) feels more like a detail, but if we tell you France is in Europe,
what
is the first question you might ask? It might be, "Is it in the European Union?" or "How big is it?" or "Do many people live there?" So I would expect us to give you those facts or details (FAQs) as well.
Facts about facts and statements about claims are a whole other topic,
but
if a "fact" is disputed, we do need to know how to show this. If you look at Wikidata, you will see that the United Kingdom has been a sovereign state since 1927. This is untrue. But if 1927 is not the answer to the question "How long has the UK been a country (or sovereign state)?", what is? "Since 1707, 1801 or 1922", depending on the details. Luckily for
you,
France has "always" been a country, despite now being the fifth republic (since 1958).
So, sometimes the Property of an entity is not a simple value or relationship. It might be better to think about it as a relationship to a "disagreement" or debate. Then, a "fact" is an entity's relationship to
an
absence of "disagreement", a "consensus", as Wikipedia would call it.
Part
of this consensus is the meaning of an entity's label. For example,
English
Wikipedia thinks "oxygen" is the chemical element ("O") and "its most stable form" ("O<sub>2</sub>", "dioxygen"). French Wikipedia thinks "oxygène" is just the element. Wikidata has statements (mostly) about the element but the "Identifiers" (external authorities) are for the English Wikipedia concept, not the French one. The point is, it is clear that
there
might be some confusion! We have a separate item for dioxygen and for
ozone
and (in theory) for atomic oxygen (and there are others) so we can give
you
all of the oxygen facts, mostly grouped by form (allotrope and/or state). Think of that as a disambiguation page enriched with detail... It's an interesting use case (or test case), I think.
Best regards, Al.
On Tuesday, 28 July 2020, <abstract-wikipedia-request@
lists.wikimedia.org>
wrote:
Send Abstract-Wikipedia mailing list submissions to abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia or, via email, send a message with subject or body 'help' to abstract-wikipedia-request@lists.wikimedia.org
You can reach the person managing the list at abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
- All work is preliminary (Denny Vrandečić)
- Two different kinds of information? (Andy)
Message: 1 Date: Mon, 27 Jul 2020 12:43:05 -0700 From: Denny Vrandečić dvrandecic@wikimedia.org To: Abstract Wikipedia list abstract-wikipedia@lists.wikimedia.org Subject: [Abstract-wikipedia] All work is preliminary Message-ID: <CA+bik1dNtpbA3H2_O= 8H8iyNrBPMbpQeAaOb04EpEaoLxCWSZQ@mail.gmail.com> Content-Type: text/plain; charset="utf-8"
Hello all,
one of the things we have been discussing in the team is that we want to do as much of our work in the open. At the same time, we're a distributed team and starting to form a shared understanding of the task at hand. Due to the COVID situation, we didn't have the opportunity to have a project kick off, where we meet for a few days and make sure that we are fully aligned and use the same words and have the same thinking.
That's both an opportunity, but also a risk, as it might lead to divergence in what we are saying and writing.
We have two possible ways forward - either we vet documents and discussions internally every time, in order to present a more unified view on the project, or we just drop that and we publish our documents and plans in the open immediately, with the understanding that this is merely
preliminary,
that there might be inconsistencies. We might discuss and disagree with each other publicly in Phabricator tasks and on this mailing list and on the wiki pages - but in the end, this is also an opportunity to together with you build a common understanding and share the process of
developing
the project vision and implementation.
So, in that light, we still have a small backlog of internal documents that we want to get out, and by the end of this week, most of the state of
the
work should be in the open, and we will move more and more of our discussions to the public, to eventually have them all in the open.
Here is a document I have been working on for a while, it is the core model of how the evaluation and representation of data, functions, and
function
calls in Wikilambda may work. Again, there is no agreement on this yet.
It
differs from the AbstractText prototype implementation, and there is a list of main differences at the end, and it also has not all the answers yet.
Thanks to, particularly Arthur P. Smith for many comments and rewriting
of
some of the sections, thanks to Lucas Werkmeister for his valuable input (and, even more important, for his work on GraalEneyj), thanks to Cyrus Omar for his advice and pointers, and thanks to Adam Baso, James Forrester, and Nick Wilson for their internal comments.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
Feedback on this would be extremely valuable, and you can see there are many open questions left.
Stay safe, Denny