Re: [Abstract-wikipedia] How to store wikitext along the structured content?

29 Jul 2020

      I'm replying to Denny's original message. I've read other replies and
James's phabricator overview and I think I understand the problem. Except I
don't. So I'm stepping back to the requirements and constraints.
Constraint 1 is that text entered by humans must be stored as
human-readable text (encoded, obviously).
So, labels and aliases and "source code" are primarily text. If you need to
store the text as JSON, that's fine, but I was imagining that the human
text, as entered, would be translated into an abstract form (with Zs and Ks
etc) and it is the abstract form that gets stored as JSON. Yes, it's
dynamic (on-the-fly) during the editing process, but the human enters text
we care about and the machine turns it into an object we care about. The
text entered is a text and the interpreted text (object)  is another text.
Translations are translations of a source. You can translate the text
entered into another language, and that's another text. Or you can
translate the derived form, and that's a different text. But if the
translation is fully automated, you might treat it as "mere" presentation:
a visualisation of underlying data. Maybe it makes sense to store such a
thing, especially if a human has seen it, even more so if they have relied
on it, but do we have a Requirement to store all translations up-front? I
don't think so. We store it in the language it was entered in (preferably
with metadata that identifies the language) and maybe we store it in a
small number of different languages (always having at least two would be
nice). Beyond that, I think you're talking about sub-pages per language
(but let's not jump to solutions).
Constraint 2 is that text is bound to its context.
A good example of this is comments in source-code. The word "in" indicates
the binding. The comment doesn't point to some text, it is the text, right
there, "in" the source code. Constraint 1 ("C1") applies: it is stored as
entered, where entered. If you later want to tidy up the source code and
replace comments with pointers to comments and/or translations, that's
fine. But C1 still applies, so you have a new version of the source-code
and you still have the old version.
When it comes to documentation outside of comments, that's just a text (as
written). It might be written as a multi-lingual text, but more than
bi-lingual is stretching it a bit for most of us. The bi-lingual text may
be a collaboration with a machine translator, but I would only see that as
a Requirement when one of the languages is WMF's own synthetic language
(ZKspeak, to coin a phrase). That is, I might compose my text in DeepL and
paste its English into the ZObject documentation. For us, that is the text
as written (C1 applies). If I also paste in the text I composed in a
different language, that's fine; that's another text as written (C1
applies). (If I make a comment to that effect, that's just text where
entered, but it's interesting metadata, so there may be a Requirement to
capture the metadata. Either way (or both ways) C1 applies.)
Requirement 1 might be that any text can be entered as Wikitext.
Ah, but the JSON can't be Wikitext... Well, that isn't the Requirement. We
can enter the text as Wikitext (so C1 applies). If it must be translated
into text that can be JSON, that's fine. We still have the Wikitext and now
we also have a translation; that's another text.
Does the above guide us toward a Solution? Well, it's not A, because we
don't have many translations in the JSON blob. But maybe we have three: the
source human text, the interpreted ZK text and a translation into a second
language.
It's not B (but I don't understand B). I think we do have "secondary
wikitext" but it might be implemented as "primary wikitext" with secondary
translations as sub-pages (somewhat optionally), as in Meta. It would be
the JSON blob that would be secondary (in a logical sence): some
transformation of a primary text. If the JSON needs to be primary, you can
treat it that way; then its human source pretends to be "about" the primary
object.
It's a bit like C, but it's not a big blob and it's probably not parallel.
Maybe it's a primary Meta-like wiki that is linked by common reference
(ZID) to the JSON blobosphere. Well, that sounds a lot like D, but...
It's not D, because the Meta-like wiki page for a ZObject is not a sub-page
of a non-wiki page. Wikipedia pages are not sub-pages of their Wikidata
Item's page, but you can look at them as if they are. We can link from one
Wikipedia to another directly, or we can link through Wikidata. I know
callable functions are a bit different but, as I said at the beginning, I
don't understand the problem. Hopefully this input will still be of benefit
to somebody who does, however.
Best regards,
Al.
On Wednesday, 29 July 2020, abstract-wikipedia-request@lists.wikimedia.org
wrote:
...
Send Abstract-Wikipedia mailing list submissions to
        abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
or, via email, send a message with subject or body 'help' to
        abstract-wikipedia-request@lists.wikimedia.org
You can reach the person managing the list at
        abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:

Re: Two different kinds of information? (Andy) (Denny Vrandečić)
How to store wikitext along the structured        content?
(Denny Vrandečić)
Re: Two different kinds of information? (Denny Vrandečić)

Message: 1
Date: Tue, 28 Jul 2020 13:57:52 -0700
From: Denny Vrandečić dvrandecic@wikimedia.org
To: "General public mailing list for the discussion of Abstract
        Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
wikimedia.org>
Subject: Re: [Abstract-wikipedia] Two different kinds of information?
        (Andy)
Message-ID:
        <CA+bik1eS56HuAqtd6O-4OS-kexUzfvu0u4hsYfXtxc83Fms42w@
mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi Al,
just one quick request - can you set up your answers to the mailing list in
such a way that it doesn't break the thread? (I am not sure how, maybe
someone else can chime in, but right now, your answers start a new thread
instead of continuing the previous one).
Louis Lecaillez had a similar issue initially, but managed to resolve it,
for which I am thankful.
If not, it is OK, but I thought I'd ask.
Thank you!
Denny
On Tue, Jul 28, 2020 at 8:23 AM Grounder UK grounderuk@gmail.com wrote:
...
Hi, Andy! Welcome!
I do like your idea of being clear about basic "facts" and details. I
think it will be key in the selection of "statements" that go into an
"article", in whatever language is required. I don't think we can say how
many levels of information there might be, but we can already see
something
...
from how Wikipedia pages are put into categories.
"France is a country in Europe" and "in western Europe" and "in the
European Union", just to mention three categories. The first is an
important fact of geography, but is the second more helpful? All
countries
...
in western Europe are (1) a country and (2) in Europe and (3) to the
west.
...
(3) feels more like a detail, but if we tell you France is in Europe,
what
...
is the first question you might ask? It might be, "Is it in the European
Union?" or "How big is it?" or "Do many people live there?" So I would
expect us to give you those facts or details (FAQs) as well.
Facts about facts and statements about claims are a whole other topic,
but
...
if a "fact" is disputed, we do need to know how to show this. If you look
at Wikidata, you will see that the United Kingdom has been a sovereign
state since 1927. This is untrue. But if 1927 is not the answer to the
question "How long has the UK been a country (or sovereign state)?", what
is? "Since 1707, 1801 or 1922", depending on the details. Luckily for
you,
...
France has "always" been a country, despite now being the fifth republic
(since 1958).
So, sometimes the Property of an entity is not a simple value or
relationship. It might be better to think about it as a relationship to a
"disagreement" or debate. Then, a "fact" is an entity's relationship to
an
...
absence of "disagreement", a "consensus", as Wikipedia would call it.
Part
...
of this consensus is the meaning of an entity's label. For example,
English
...
Wikipedia thinks "oxygen" is the chemical element ("O") and "its most
stable form" ("O<sub>2</sub>", "dioxygen"). French Wikipedia thinks
"oxygène" is just the element. Wikidata has statements (mostly) about the
element but the "Identifiers" (external authorities) are for the English
Wikipedia concept, not the French one. The point is, it is clear that
there
...
might be some confusion! We have a separate item for dioxygen and for
ozone
...
and (in theory) for atomic oxygen (and there are others) so we can give
you
...
all of the oxygen facts, mostly grouped by form (allotrope and/or state).
Think of that as a disambiguation page enriched with detail... It's an
interesting use case (or test case), I think.
Best regards,
Al.
On Tuesday, 28 July 2020, <abstract-wikipedia-request@
lists.wikimedia.org>
...
wrote:
...
Send Abstract-Wikipedia mailing list submissions to
        abstract-wikipedia@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
or, via email, send a message with subject or body 'help' to
        abstract-wikipedia-request@lists.wikimedia.org
You can reach the person managing the list at
        abstract-wikipedia-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:

All work is preliminary (Denny Vrandečić)
Two different kinds of information? (Andy)

Message: 1
Date: Mon, 27 Jul 2020 12:43:05 -0700
From: Denny Vrandečić dvrandecic@wikimedia.org
To: Abstract Wikipedia list abstract-wikipedia@lists.wikimedia.org
Subject: [Abstract-wikipedia] All work is preliminary
Message-ID:
        <CA+bik1dNtpbA3H2_O=
8H8iyNrBPMbpQeAaOb04EpEaoLxCWSZQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hello all,
one of the things we have been discussing in the team is that we want to
do
as much of our work in the open. At the same time, we're a distributed
team
and starting to form a shared understanding of the task at hand. Due to
the
COVID situation, we didn't have the opportunity to have a project kick
off,
where we meet for a few days and make sure that we are fully aligned and
use the same words and have the same thinking.
That's both an opportunity, but also a risk, as it might lead to
divergence
in what we are saying and writing.
We have two possible ways forward - either we vet documents and
discussions
internally every time, in order to present a more unified view on the
project, or we just drop that and we publish our documents and plans in
the
open immediately, with the understanding that this is merely
preliminary,
...
...
that there might be inconsistencies. We might discuss and disagree with
each other publicly in Phabricator tasks and on this mailing list and on
the wiki pages - but in the end, this is also an opportunity to together
with you build a common understanding and share the process of
developing
...
...
the project vision and implementation.
So, in that light, we still have a small backlog of internal documents
that
we want to get out, and by the end of this week, most of the state of
the
...
...
work should be in the open, and we will move more and more of our
discussions to the public, to eventually have them all in the open.
Here is a document I have been working on for a while, it is the core
model
of how the evaluation and representation of data, functions, and
function
...
...
calls in Wikilambda may work. Again, there is no agreement on this yet.
It
...
...
differs from the AbstractText prototype implementation, and there is a
list
of main differences at the end, and it also has not all the answers yet.
Thanks to, particularly Arthur P. Smith for many comments and rewriting
of
...
...
some of the sections, thanks to Lucas Werkmeister for his valuable input
(and, even more important, for his work on GraalEneyj), thanks to Cyrus
Omar for his advice and pointers, and thanks to Adam Baso, James
Forrester,
and Nick Wilson for their internal comments.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
Feedback on this would be extremely valuable, and you can see there are
many open questions left.
Stay safe,
Denny

2024

2023

2022

2021

2020

Re: [Abstract-wikipedia] How to store wikitext along the structured content?