Hi Amir,

> But you see, that's exactly the problem: This may be clear to people who learned about side effects,
> composition, and referential transparency in the university for their Computer Science or Mathematics
> degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.

I agree. In the end, when the Wikimedia editors end up using this system, it must be accessible. Also, building the functions in the new project aims to be as accessible as possible. At that point, we won't be talking about side effects and referential transparency. At that point, we will need to be talking in terms that are accessible to as many as possible.

We're not there yet.

In order to get there, I think it makes good sense to understand the state of the art in programming language research, and to distill what the common issues are hampering the democratization of coding. In order to talk about these I sometimes slip into jargon, and I want to apologize for that. That's similar to when Wikidata was developed, and sometimes we were talking about ontologies and inverse functional properties and description logics. But I very much hope that most of this remains well encapsulated away from the contributors to Wikidata and even more so from the contributors and readers of Wikipedia.

So, yes, please call me out when I use jargon. I will try to be more mindful about that.


> I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md ,
> and at most, I can see that the Z objects are a programming-language-independent abstract representation of the
> code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different
> from a Scribunto module?
 
Yeah. Mostly. (Trying not to point out that actually no code at the bottom is really required, because that is an irrelevant tangent to this conversation.)

(And failing at not pointing it out. Sorry.)

Yes,  in many ways it is very much alike to a Scribunto module.

We'll definitely take a very close look at Scribunto and see how much of that will make sense to reuse. The sandboxing experience we gained in ops and development, the fact that it is already a well used system that had years of real life exposure, the skills the community has already acquired, that's nothing that should be dismissed lightly.

We want to ensure that the system understands the implemented functionality sufficiently to expose comfortable interfaces to it that can be used by a wide variety of users. And also to be able to easily and safely reuse the functionality in other functions.

I hope that makes sense.

Thanks,
Denny




On Thu, Jul 16, 2020 at 5:18 AM Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:

‫בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת ‪Denny Vrandečić‬‏ <‪dvrandecic@wikimedia.org‬‏>:‬
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.

But you see, that's exactly the problem: This may be clear to people who learned about side effects, composition, and referential transparency in the university for their Computer Science or Mathematics degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.

If the project only wants contributors with CS degrees or comparable knowledge, then it's fine to do it completely differently from a wiki, and more like a software development project: with Git, with rigorous code review, and so on. Maybe the eventual result will be better if it's done this way. But don't expect the size and the diversity of the Wikimedia community, and don't expect that it will be easy to integrate the functions' output into Wikipedia. Maybe it will _possible_, but it won't be easy.

But if the intention is to let the large and diverse Wikimedia community edit it, then from the outset, this project needs to speak in terms that this large community understands.
This community understands the following things:
* what is an encyclopedia, and what are notability, reliability, original research, etc.
* wiki pages and wikitext (links, headings, categories, references, files)
* a relatively large part of this community understands structured data in the form of Wikidata
* a relatively large part of this community understands templates (at least how to add them to pages, though not necessarily how to create them)
* a smaller, but still relatively large and multi-lingual group understands Scribunto modules

It's totally fine to introduce new concepts or to upgrade existing ones, but it has to start from something familiar. Jumping straight into composition, side effects, and Z objects is too cryptic.
 
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful.

Sorry, but I still don't understand how is this different from making the following Scribunto module:

local p = {}

function p.concat(frame)
    return frame.args[1] .. frame.args[2]
end

return p

I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md , and at most, I can see that the Z objects are a programming-language-independent abstract representation of the code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different from a Scribunto module?
 
Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.

But Scribunto can also support multiple programming languages. In practice it supports only Lua at the moment, but in theory more languages like JavaScript and Python could be added. (It's an old and pretty popular community wish.)

So, is a conceptually new infrastructure really needed, and are functions really different from modules? "Scribunto is not scalable for such an ambitious project" would be an acceptable answer, but I'd love to know whether this was at least considered.
 
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia