Hi folks!
Since people have started talking about runtimes for the functions defined in Wikilambda (see: V8 Engine andor WebAssembly thread), I figured I should mention the project I’ve been working on for a few weeks: GraalEneyj.[1] It’s supposed to be(come) a GraalVM-based (re)implementation of the eneyj language/runtime[2] that Denny wrote and which is currently part of the AbstractText extension.
GraalVM,[3] for those who haven’t heard of it yet, is a Java-based project to build high-performance language implementations. There are already GraalVM-based implementations of JavaScript,[4] R,[5] Ruby,[6] LLVM,[7] WebAssembly,[8] and more,[9] and code snippets in any of these languages can run in the same (J)VM ((Java) Virtual Machine) and call each other with no performance penalty. This is of course interesting if we’re thinking about Wikilambda holding function implementations in different languages – with GraalEneyj, it should be possible to seamlessly compose functions implemented in any language for which a GraalVM implementation is available. GraalVM is also supposed to deliver highly optimized language implementations with relatively little development effort, but so far GraalEneyj is not yet at a point where I could benchmark it.
If that sounds interesting to you, please get in touch! You can also explore the code that I have so far (there’s not a ton of documentation yet, but hopefully the README[10] is at least enough to get a basic idea and run the tests), or watch the GraalEneyj walkthrough[11] that I recorded last month. (At some point I should record another one, since the code keeps changing, but for now that’s all I have to offer.)
Cheers, Lucas Werkmeister
PS: In my day job I’m a software developer at Wikimedia Deutschland e. V., but all of this is private-time activity so far, not work related.
[1]: https://github.com/lucaswerkmeister/graaleneyj/ [2]: https://github.com/google/abstracttext/tree/master/eneyj [3]: https://www.graalvm.org/ [4]: https://github.com/graalvm/graaljs [5]: https://github.com/oracle/fastr/ [6]: https://github.com/oracle/truffleruby [7]: https://github.com/oracle/graal/tree/master/sulong [8]: https://github.com/oracle/graal/tree/master/wasm [9]: https://github.com/oracle/graal/blob/master/truffle/docs/Languages.md [10]: https://github.com/lucaswerkmeister/graaleneyj/#readme [11]: https://www.youtube.com/watch?v=mQVpcBMjSaw
Lucas,
I cannot express how amazed I am by your project, and how thankful I am! This is truly fantastic!
I think you found a really interesting combination of technologies for this particular project, thank you for exploring this space. I will take a closer look in the next few days.
Thank you, Denny
On Tue, Jul 14, 2020 at 2:38 PM Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Hi folks!
Since people have started talking about runtimes for the functions defined in Wikilambda (see: V8 Engine andor WebAssembly thread), I figured I should mention the project I’ve been working on for a few weeks: GraalEneyj.[1] It’s supposed to be(come) a GraalVM-based (re)implementation of the eneyj language/runtime[2] that Denny wrote and which is currently part of the AbstractText extension.
GraalVM,[3] for those who haven’t heard of it yet, is a Java-based project to build high-performance language implementations. There are already GraalVM-based implementations of JavaScript,[4] R,[5] Ruby,[6] LLVM,[7] WebAssembly,[8] and more,[9] and code snippets in any of these languages can run in the same (J)VM ((Java) Virtual Machine) and call each other with no performance penalty. This is of course interesting if we’re thinking about Wikilambda holding function implementations in different languages – with GraalEneyj, it should be possible to seamlessly compose functions implemented in any language for which a GraalVM implementation is available. GraalVM is also supposed to deliver highly optimized language implementations with relatively little development effort, but so far GraalEneyj is not yet at a point where I could benchmark it.
If that sounds interesting to you, please get in touch! You can also explore the code that I have so far (there’s not a ton of documentation yet, but hopefully the README[10] is at least enough to get a basic idea and run the tests), or watch the GraalEneyj walkthrough[11] that I recorded last month. (At some point I should record another one, since the code keeps changing, but for now that’s all I have to offer.)
Cheers, Lucas Werkmeister
PS: In my day job I’m a software developer at Wikimedia Deutschland e. V., but all of this is private-time activity so far, not work related.
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
When I first read the paper ( https://arxiv.org/pdf/2004.04733.pdf ), I got the impression that it will be a whole new programming language, especially from figures 1, 2, 3, and 6 in the PDF.
Then Denny wrote on Talk:Abstract Wikipedia that it will be just Lua, familiar from Scribunto modules, and also JavaScript. But this raises a few questions: 1. Will this mean the addition of JavaScript as a new engine for Scribunto, which is a very old and popular community request? See https://phabricator.wikimedia.org/T61101 . Or will this be in some other framework? 2. Will the Lua support be the same as in Scribunto, just with some more built-in methods available for module developers? Or will it be something substantially new? 3. If the answers to questions 1 and 2 is "it's basically Scribunto with new capabilities", then it's probably good, but then why are they called "functions" and not "modules"? Or are Abstract Wikipedia functions substantially different from modules?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
בתאריך יום ד׳, 15 ביולי 2020 ב-0:38 מאת Lucas Werkmeister < mail@lucaswerkmeister.de>:
Hi folks!
Since people have started talking about runtimes for the functions defined in Wikilambda (see: V8 Engine andor WebAssembly thread), I figured I should mention the project I’ve been working on for a few weeks: GraalEneyj.[1] It’s supposed to be(come) a GraalVM-based (re)implementation of the eneyj language/runtime[2] that Denny wrote and which is currently part of the AbstractText extension.
GraalVM,[3] for those who haven’t heard of it yet, is a Java-based project to build high-performance language implementations. There are already GraalVM-based implementations of JavaScript,[4] R,[5] Ruby,[6] LLVM,[7] WebAssembly,[8] and more,[9] and code snippets in any of these languages can run in the same (J)VM ((Java) Virtual Machine) and call each other with no performance penalty. This is of course interesting if we’re thinking about Wikilambda holding function implementations in different languages – with GraalEneyj, it should be possible to seamlessly compose functions implemented in any language for which a GraalVM implementation is available. GraalVM is also supposed to deliver highly optimized language implementations with relatively little development effort, but so far GraalEneyj is not yet at a point where I could benchmark it.
If that sounds interesting to you, please get in touch! You can also explore the code that I have so far (there’s not a ton of documentation yet, but hopefully the README[10] is at least enough to get a basic idea and run the tests), or watch the GraalEneyj walkthrough[11] that I recorded last month. (At some point I should record another one, since the code keeps changing, but for now that’s all I have to offer.)
Cheers, Lucas Werkmeister
PS: In my day job I’m a software developer at Wikimedia Deutschland e. V., but all of this is private-time activity so far, not work related.
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur
The part I don't get is why a collection of functions and tests needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
What reasons does this benefit from being done as wiki editable code compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
I get why such an architecture might be a choice for a prototype, for fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
There is also a notion of how dynamic the content is to take into account. There is a difference between the needs of ever changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
On Wed, Jul 15, 2020 at 11:00 AM Gilles Dubuc gilles@wikimedia.org wrote:
The part I don't get is why a collection of functions and tests needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
What reasons does this benefit from being done as wiki editable code compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
Good questions - you may be right, but I think the wiki approach is worth trying. One benefit of being done as a wiki is the openness that is achievable - with a git repo there are always gatekeepers in some form or another - the number of people who are allowed to directly update the repository is necessarily small. Denny's vision here requires people who understand hundreds of different languages to contribute; that will never be a small group. And the more barriers in their way (having to set up a github account, fork, create pull requests etc.) the less likely we are to get that large group of contributors actively involved.
So how to avoid the problem of "breaking intermediary edits"? Denny's ZObject approach at least prevents one common problem - if everything is identified by a Z-Id then changing the name of a function or variable or type will never break things (it's just the label on that ZObject in a particular language, not how it is called)! I think in general other types of breaking changes can be avoided by requiring creation of a new ZObject, rather than editing an existing one, when it would change behavior in a breaking way. Not all functionality changes are necessarily breaking though - you can add function arguments with default behavior if one of the new arguments is missing, for example.
But maybe we will need some way to allow for atomic edits that span multiple ZObjects; if that really is an essential requirement then let's figure out how to do it. There's lots of work ahead no matter how we do this!
Arthur
Hello,
I totally agree with every concern raised in this email. In fact, I already stated my position that an Abstract Wikipedia doesn't need a new approach of programming in my first email. I also agree that anyone who has enough technical skills to write or edit code will be able to use git or any other version control software to contribute. I'll wait and see what going on on this front.
Besides, I also strongly hope that the Z-Object JSON subset will include the possibility to add comments otherwise this will be a nightmare to understand and maintain code.
@replying to sister email by Arthur:
Number of contributors is not an issue. The dotnet repository on Github for instance has 300+ contributors, and it probably can scale to a few more orders of magnitude. Language understanding is also not really an issue either as modules for a given linguistic community will likely be developped by the said community. In addition, code is generally written in English with comments in the local language (OpenOffice had comments in German for decade, for instance), in part because historically programming language didn't even support identifiers in other character sets than ASCII. For full "localized" programming one would need to also write filenames in their language, which will break things as Windows and macOS don't use the same Unicode canonicalization form; hence the industry practice write filenames in plain old ASCII.
Best regards, Louis Lecailliez
________________________________ De : Abstract-Wikipedia abstract-wikipedia-bounces@lists.wikimedia.org de la part de Gilles Dubuc gilles@wikimedia.org Envoyé : mercredi 15 juillet 2020 15:00 À : General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org Objet : Re: [Abstract-wikipedia] Runtime considerations: Introducing GraalEneyj
The part I don't get is why a collection of functions and tests needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
What reasons does this benefit from being done as wiki editable code compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
I get why such an architecture might be a choice for a prototype, for fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
There is also a notion of how dynamic the content is to take into account. There is a difference between the needs of ever changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith <arthurpsmith@gmail.commailto:arthurpsmith@gmail.com> wrote: On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <amir.aharoni@mail.huji.ac.ilmailto:amir.aharoni@mail.huji.ac.il> wrote: I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.orgmailto:Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype, for fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".
There have been a number of other topics, raised in the thread, that I also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.
Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests needs to
be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype, for
fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
I keep being confused about this point: What are the "functions" on
AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've
been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually
necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code,
along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amirouche ~ https://hyper.dev
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Sorry if I sound stubborn on this topic, but all I'm getting from your proposal and this response is "we're going to do X", not a valid comparison to the alternative of more traditional open source projects. There should be a methodic and objective comparison, not subjective statements such as the fact that it may lower the barrier to entry. It's entirely unproven that doing this as a wiki would bring more contributors. In fact folks who are the most knowledgeable about writing code might find it odd and unpleasant to work with, with all the function definitions scattered across web pages. You would have to open many browser tabs to even comprehend that codebase, which is clearly an inferior experience compared to traditional open source, where contributors would have access to a vast amount of software of their liking to view, edit and contribute to a large codebase. And once this codebase, because that's what it is, reaches enough complexity, how you commit changes isn't going to be what stops people from contributing. The challenge will be in the actual code contributed and how complex the codebase is to navigate and understand. If the only way to navigate it is the website and a small amount of custom tools built for it, it's a far inferior experience compared to the vibrant open source programming ecosystem that has existed for decades.
Furthermore, there's nothing stopping this from being implemented as a registration-free web-based code editing platform backed by git or another proven source code version control system. Developers that are knowledgeable of version control systems could check out the entire codebase and commit directly to the git repo. Folks who aren't familiar with that could edit functions/files directly on the website. What github has done in that respect with on-site editing can be made even simpler if you settle on a simpler git workflow than pull requests. The experience can be greatly streamlined if the common case for casual contributors is single function editing. In previous responses it was said that a git version of this would require registration. It doesn't, it's entirely up to you how open or closed a code repository is. And any fear you might have about how you would manage a completely open code repository should be the same as a completely open wiki. As long as you don't require registration, you're dealing with exactly the same vandalism challenges.
A lot of the innovation this project would bring would have a much bigger impact if done as a git-backed project. Tools developed for it would be more likely to be picked up by others who want to do wiki-style open source programming. Something completely custom built on top of MediaWiki would leave us in our existing MediaWiki bubble. We know how limited the open source community around MediaWiki is, and this would likely only attract a subset of it.
Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.
That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly. But even that would have to be pitted against, say, a single-project registration-free custom version of Gitlab, for example. Because code collaboration tools already exist. Building this on top of MediaWiki is going to introduce a ton of reinventing the wheel around code display, code review, CI, etc. Things that other projects have done well already. It could be a lot cheaper to build on top of an existing code collaboration platform.
On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić dvrandecic@wikimedia.org wrote:
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".
There have been a number of other topics, raised in the thread, that I also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.
Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests needs
to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype, for
fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
I keep being confused about this point: What are the "functions" on
AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've
been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually
necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code,
along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amirouche ~ https://hyper.dev
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
בתאריך יום ה׳, 16 ביולי 2020 ב-11:53 מאת Gilles Dubuc < gilles@wikimedia.org>:
Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.
The "hammer and nails" analogy is certainly a valid way to look at it, but for twenty years the community that writes wiki pages has been a pretty good hammer.
The main question is whether Abstract Wikipedia will be built as a continuation of existing practices with some significant updates, or as a totally new thing. Both paths are conceivable, but less continuity means more effort in building a community from scratch. Also, this eventual community will most likely be smaller, and the relationship between this community and Wikipedia will be even more difficult than the relationship between Wikidata and Wikipedia. As long as these challenges are acknowledged, it's OK to think about it.
That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly.
Indeed. It won't be trivial, but it's a sensible path to consider.
No need to apologize, and please be stubborn. I hope that such discussions frequently boil down to "I haven't explained that well enough" instead of "we're gonna do what we gonna do because I said so", so asking these questions ensures it's the former and dispels notions of the latter.
And I am sure there are possible elements of Maslow's hammer [1] and potentially a bit of hubris on my side in play, so I really want to make sure I listen, respond, and explain myself better.
One minor point: "it is entirely unproven that doing this as a wiki would bring more contributors." That's true. That's true for everything new you try. I am not sure how to respond to that, besides saying, yes, that's one of the things we're figuring out. We're having a small team and are exploring that space.
But let's take a step back. Whether we use git or C++ or MediaWiki or Eclipse is, in the end, a question of the implementation. Let's leave the question of *how* aside for a moment, and see whether we have agreement on *what*. What are we trying to achieve with the first part of the project? And by the first part I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
(And, maybe the real question is, do we even have a reasonably shared understanding of what that goal means? I am unsure about that - having worked on that for such a long time, I probably make too many assumptions that this is actually clear, but it could be that the first task is to clear that up and create a more shared understanding of that goal - please, let me know!)
Thanks! Denny
On Thu, Jul 16, 2020 at 1:53 AM Gilles Dubuc gilles@wikimedia.org wrote:
Sorry if I sound stubborn on this topic, but all I'm getting from your proposal and this response is "we're going to do X", not a valid comparison to the alternative of more traditional open source projects. There should be a methodic and objective comparison, not subjective statements such as the fact that it may lower the barrier to entry. It's entirely unproven that doing this as a wiki would bring more contributors. In fact folks who are the most knowledgeable about writing code might find it odd and unpleasant to work with, with all the function definitions scattered across web pages. You would have to open many browser tabs to even comprehend that codebase, which is clearly an inferior experience compared to traditional open source, where contributors would have access to a vast amount of software of their liking to view, edit and contribute to a large codebase. And once this codebase, because that's what it is, reaches enough complexity, how you commit changes isn't going to be what stops people from contributing. The challenge will be in the actual code contributed and how complex the codebase is to navigate and understand. If the only way to navigate it is the website and a small amount of custom tools built for it, it's a far inferior experience compared to the vibrant open source programming ecosystem that has existed for decades.
Furthermore, there's nothing stopping this from being implemented as a registration-free web-based code editing platform backed by git or another proven source code version control system. Developers that are knowledgeable of version control systems could check out the entire codebase and commit directly to the git repo. Folks who aren't familiar with that could edit functions/files directly on the website. What github has done in that respect with on-site editing can be made even simpler if you settle on a simpler git workflow than pull requests. The experience can be greatly streamlined if the common case for casual contributors is single function editing. In previous responses it was said that a git version of this would require registration. It doesn't, it's entirely up to you how open or closed a code repository is. And any fear you might have about how you would manage a completely open code repository should be the same as a completely open wiki. As long as you don't require registration, you're dealing with exactly the same vandalism challenges.
A lot of the innovation this project would bring would have a much bigger impact if done as a git-backed project. Tools developed for it would be more likely to be picked up by others who want to do wiki-style open source programming. Something completely custom built on top of MediaWiki would leave us in our existing MediaWiki bubble. We know how limited the open source community around MediaWiki is, and this would likely only attract a subset of it.
Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.
That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly. But even that would have to be pitted against, say, a single-project registration-free custom version of Gitlab, for example. Because code collaboration tools already exist. Building this on top of MediaWiki is going to introduce a ton of reinventing the wheel around code display, code review, CI, etc. Things that other projects have done well already. It could be a lot cheaper to build on top of an existing code collaboration platform.
On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić dvrandecic@wikimedia.org wrote:
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".
There have been a number of other topics, raised in the thread, that I also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.
Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests needs
to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype, for
fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
I keep being confused about this point: What are the "functions" on
AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've
been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually
necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code,
along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amirouche ~ https://hyper.dev
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Le jeu. 16 juil. 2020 à 19:50, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
[...] I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
Yes. I would add that it is something easier to get started than GitHub and it is not merely a clone of http://rosettacode.org/wiki/Rosetta_Code
Rosetta Code is a great example of something similar, but we also want to add more constraints and make it possible to actually call the code automatically.
If you take a look at the implementations of Conway's Game of Life in Rosetta Code - http://rosettacode.org/wiki/Conway%27s_Game_of_Life - you can see implementations in many languages, but they are often subtly different, and there's nothing in the system to avoid that. We want to avoid that divergence, and we want to allow users to enter the input data, click on a button, and run the code.
But still, Rosetta Code illustrates neatly some of the aspects of the project vision, yes, thank you for that pointer!
On Thu, Jul 16, 2020 at 11:23 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le jeu. 16 juil. 2020 à 19:50, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
[...] I mean Wikilambda. The goal is to create a community project to
develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
Yes. I would add that it is something easier to get started than GitHub and it is not merely a clone of http://rosettacode.org/wiki/Rosetta_Code
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Thank you for this enlightening discussion.
If I understand correctly, most of WikiLambda's goals could be achieved by the existing Scribunto Lua modules. 1. Being editable on wiki 2. Lowering the entry barrier compared to e.g. MediaWiki extension. 3. Being compoundable and extensible (one module could call another one). 4. [If the "global Lua modules" idea comes forward] Being shared by multiple wikis.
Additionally, WikiLambda proposes: 1. To provide multiple runtimes for the written code instead of just Lua to allow the WikiLambda functions to be evaluated easily in as many places as possible instead of just in the Scribunto Lua virtual machine. 2. Provide "pure" functions instead of modules that often make a heavy use of global state (wiki language...). This is an important step for code reusability and composability. 3. Provide a better way to document the modules/functions than Scribunto modules.
I don't see why these three goals could not be archived by an improved version of Scribunto. This version could provide the same documentation capacities like you described in the proposal (proposition 3) and provide a "pure function" kind of modules that do not rely on MediaWiki global state, while keeping (most) of the same standard library as the usual Lua modules (proposition 2). Lua VMs are fairly simple and are already available in a lot of languages, so, I am not sure using Lua would make the provided functions evaluation much harder and slower. LuaJIT is fairly well known for is versatility and its performances. I believe it would solve fairly well the proposition 1.
An other question, on WikiLambda itself:
you can see implementations in many languages, but they are often subtly different, and there's nothing in the system to avoid that.
I just have a concern here. How do you plan to mitigate the edge case differences between programming languages for the "native functions" implementations? For example Python 3 "int" type encodes arbitrary precision integers while JavaScript numbers could only encode integers that fit in a 64bits IEEE 754 float and Rust i64 have a 64bits precision limit. I am afraid that if WikiLambda wants to rely on language specific implementations for performance reasons, WikiLambda would have to deal with a lot of these discrepancies and end up in the same problems as Rosetta Code.
The other alternative I see is to not have such "native functions" and have a basic set of well defined primitives. But it means, more or less, creating a new programming language and optimizers to be able to evaluate it quickly.
@Denny Do you have a solution to this problem I do not see?
Cheers,
Thomas
Le ven. 17 juil. 2020 à 18:07, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
Rosetta Code is a great example of something similar, but we also want to add more constraints and make it possible to actually call the code automatically.
If you take a look at the implementations of Conway's Game of Life in Rosetta Code - http://rosettacode.org/wiki/Conway%27s_Game_of_Life - you can see implementations in many languages, but they are often subtly different, and there's nothing in the system to avoid that. We want to avoid that divergence, and we want to allow users to enter the input data, click on a button, and run the code.
But still, Rosetta Code illustrates neatly some of the aspects of the project vision, yes, thank you for that pointer!
On Thu, Jul 16, 2020 at 11:23 AM Amirouche Boubekki amirouche.boubekki@gmail.com wrote:
Le jeu. 16 juil. 2020 à 19:50, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
[...] I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
Yes. I would add that it is something easier to get started than GitHub and it is not merely a clone of http://rosettacode.org/wiki/Rosetta_Code
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
On Fri, Jul 17, 2020 at 1:50 PM Thomas Pellissier Tanon < thomas@pellissier-tanon.fr> wrote:
you can see implementations in many languages, but they are often subtly
different, and there's nothing in the system to avoid that.
I just have a concern here. How do you plan to mitigate the edge case differences between programming languages for the "native functions" implementations? For example Python 3 "int" type encodes arbitrary precision integers while JavaScript numbers could only encode integers that fit in a 64bits IEEE 754 float and Rust i64 have a 64bits precision limit. I am afraid that if WikiLambda wants to rely on language specific implementations for performance reasons, WikiLambda would have to deal with a lot of these discrepancies and end up in the same problems as Rosetta Code.
I think this is relatively trivial: Python3 "int"s are simply a different type of entity from IEEE 754 floats and 64-bit or 32-bit or whatever limited integer examples there are. We already translate between types within single languages (string to int, int to float, big int to little int, unsigned int to signed, etc.) so all that's required here is being a little more exact about these variable types from the start. IEEE 754 floats have a rather precisely defined definition that should be met by all implementing languages, for instance.
Arthur
Le ven. 17 juil. 2020 à 22:54, Arthur Smith arthurpsmith@gmail.com a écrit :
On Fri, Jul 17, 2020 at 1:50 PM Thomas Pellissier Tanon thomas@pellissier-tanon.fr wrote:
you can see implementations in many languages, but they are often subtly different, and there's nothing in the system to avoid that.
I just have a concern here. How do you plan to mitigate the edge case differences between programming languages for the "native functions" implementations? For example Python 3 "int" type encodes arbitrary precision integers while JavaScript numbers could only encode integers that fit in a 64bits IEEE 754 float and Rust i64 have a 64bits precision limit. I am afraid that if WikiLambda wants to rely on language specific implementations for performance reasons, WikiLambda would have to deal with a lot of these discrepancies and end up in the same problems as Rosetta Code.
I think this is relatively trivial: Python3 "int"s are simply a different type of entity from IEEE 754 floats and 64-bit or 32-bit or whatever limited integer examples there are. We already translate between types within single languages (string to int, int to float, big int to little int, unsigned int to signed, etc.) so all that's required here is being a little more exact about these variable types from the start. IEEE 754 floats have a rather precisely defined definition that should be met by all implementing languages, for instance.
Yes, we could definitely do that. But, if we choose a specific definition then some implementations might become very complex. To continue the example, if we define a "64 bits signed 2 complements integer", the JavaScript implementation will probably have to use BigInt and a modulus operator to emulate this, the Python implementation a modulus operator... If we define a "number of unicode code points in a string" the JavaScript implementation could not use String.length but have to fire more complex code... It is fine if we only implement in the "native language" very simple functions (addition, string length...), but if, for performance reasons, people start to write more complex functions in native languages, I believe it will be very hard to not get discrepancies between the implementations because it would require the implementations to take care of all these details to keep a perfect interoperability.
Thomas
Arthur _______________________________________________> Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Thomas,
Yes, that's true. A wise selection of types and how exactly they are defined will help a lot with these issues.
This is not exactly a new problem, fortunately - ProtoBufs, WebIDL, XML Schema Datatypes, and good old CORBA already had to deal with these issues before, so we have a bit of former work to look at and learn from. Also, one good thing is, we don't have to agree on a single integer type, but can have several - most of the complexity can, in most cases, be transferred to the type conversion, so that the individual functions remain reasonably simple.
To give one example: a type such as integers from 0 to 10,000, which we could define, would have really easy implementations in native Python, JavaScript and C++, even though the underlying addition does not behave exactly the same in edge cases - but we would basically remove the edge cases from the type. Arthur described that already better than I could.
But yes, if one of our types extends beyond a native type, then basically we would need to reimplement the logic instead of reusing native mechanisms. That would loose some of the efficiency.
The other topic you talked about was reusing Scribunto. We will take a very good luck at Scribunto, in order to reuse parts of the code and to build on top of the valuable experiences gained.
Thanks for raising that important point! Denny
On Fri, Jul 17, 2020 at 2:32 PM Thomas Pellissier Tanon < thomas@pellissier-tanon.fr> wrote:
Le ven. 17 juil. 2020 à 22:54, Arthur Smith arthurpsmith@gmail.com a écrit :
On Fri, Jul 17, 2020 at 1:50 PM Thomas Pellissier Tanon <
thomas@pellissier-tanon.fr> wrote:
you can see implementations in many languages, but they are often
subtly different, and there's nothing in the system to avoid that.
I just have a concern here. How do you plan to mitigate the edge case differences between programming languages for the "native functions" implementations? For example Python 3 "int" type encodes arbitrary precision integers while JavaScript numbers could only encode integers that fit in a 64bits IEEE 754 float and Rust i64 have a 64bits precision limit. I am afraid that if WikiLambda wants to rely on language specific implementations for performance reasons, WikiLambda would have to deal with a lot of these discrepancies and end up in the same problems as Rosetta Code.
I think this is relatively trivial: Python3 "int"s are simply a
different type of entity from IEEE 754 floats and 64-bit or 32-bit or whatever limited integer examples there are. We already translate between types within single languages (string to int, int to float, big int to little int, unsigned int to signed, etc.) so all that's required here is being a little more exact about these variable types from the start. IEEE 754 floats have a rather precisely defined definition that should be met by all implementing languages, for instance.
Yes, we could definitely do that. But, if we choose a specific definition then some implementations might become very complex. To continue the example, if we define a "64 bits signed 2 complements integer", the JavaScript implementation will probably have to use BigInt and a modulus operator to emulate this, the Python implementation a modulus operator... If we define a "number of unicode code points in a string" the JavaScript implementation could not use String.length but have to fire more complex code... It is fine if we only implement in the "native language" very simple functions (addition, string length...), but if, for performance reasons, people start to write more complex functions in native languages, I believe it will be very hard to not get discrepancies between the implementations because it would require the implementations to take care of all these details to keep a perfect interoperability.
Thomas
Arthur _______________________________________________> Abstract-Wikipedia
mailing list
Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Hoi, When the objective is to create a community, you may want to let a thousand flowers bloom.
Wikilambda does not come out of nothing. There have been multiple comparable projects in the past and all of them have good and bad points. For me Wikilambda is to be a next step that gets us where we want to end up; a place where articles in the Wikipedia style are generated for any and all language. When the objective is to create a community, the one thing that will get us a community is when we provide a service make a difference.
I want us to consider two approaches, automated description by Magnus Manske and the Cebuano Wikipedia project by Sverker Johansson. When Wikilambda is able to improve on the two underlying functionalities, it will generate a buzz that enables the whole ecosystem that is required for the full Wikilambda.
==Automated descriptions== Automated descriptions help in the disambiguation of items. They can be triggered to be used in the search results of projects, they are used in Reasonator [1]. They function after a fashion in any language. The challenge would be to take over this functionality and improve it. Make it use proper grammar and inflections in any language. With improvements underway, we can extend the functionality and use it in combination with Special:MediaSearch [2], a search front end for Commons that enables search for images in any language.
By seeking early results that make a marked difference in our projects, we get a community interested in adding labels, properties and consider the extended functionality.
===Cebuano Wikipedia=== Cebuano Wikipedia is quite a contentious project but Wikilambda aims at achieving exactly the same objectives. There are two parts we can easily do. We can use Wikidata for its source. When the underlying data changes it is easy to re-generate text. The data is used in several Wikipedias. It is therefore not only about Cebuano but also about Waray Waray and Swedish. One objective is to replace the underlying technology and make it Wikilambda native. This will result in the ability to generate articles in other languages.
When the approach of creating articles is morphed in a way that is appropriate for Wikilambda, it will allow all kinds of experiments. Among them are texts that are cached and not saved as articles. Among them other categories of articles.
The main thing is that this project aims to make a practical difference. Thanks, GerardM
[1] https://reasonator.toolforge.org/?find=%D8%AF%D8%A7%D9%82%D9%84%D8%A7%D8%B3+... [2] https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=%D8...
Thanks, GerardM
On Thu, 16 Jul 2020 at 19:50, Denny Vrandečić dvrandecic@wikimedia.org wrote:
No need to apologize, and please be stubborn. I hope that such discussions frequently boil down to "I haven't explained that well enough" instead of "we're gonna do what we gonna do because I said so", so asking these questions ensures it's the former and dispels notions of the latter.
And I am sure there are possible elements of Maslow's hammer [1] and potentially a bit of hubris on my side in play, so I really want to make sure I listen, respond, and explain myself better.
One minor point: "it is entirely unproven that doing this as a wiki would bring more contributors." That's true. That's true for everything new you try. I am not sure how to respond to that, besides saying, yes, that's one of the things we're figuring out. We're having a small team and are exploring that space.
But let's take a step back. Whether we use git or C++ or MediaWiki or Eclipse is, in the end, a question of the implementation. Let's leave the question of *how* aside for a moment, and see whether we have agreement on *what*. What are we trying to achieve with the first part of the project? And by the first part I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
(And, maybe the real question is, do we even have a reasonably shared understanding of what that goal means? I am unsure about that - having worked on that for such a long time, I probably make too many assumptions that this is actually clear, but it could be that the first task is to clear that up and create a more shared understanding of that goal - please, let me know!)
Thanks! Denny
On Thu, Jul 16, 2020 at 1:53 AM Gilles Dubuc gilles@wikimedia.org wrote:
Sorry if I sound stubborn on this topic, but all I'm getting from your proposal and this response is "we're going to do X", not a valid comparison to the alternative of more traditional open source projects. There should be a methodic and objective comparison, not subjective statements such as the fact that it may lower the barrier to entry. It's entirely unproven that doing this as a wiki would bring more contributors. In fact folks who are the most knowledgeable about writing code might find it odd and unpleasant to work with, with all the function definitions scattered across web pages. You would have to open many browser tabs to even comprehend that codebase, which is clearly an inferior experience compared to traditional open source, where contributors would have access to a vast amount of software of their liking to view, edit and contribute to a large codebase. And once this codebase, because that's what it is, reaches enough complexity, how you commit changes isn't going to be what stops people from contributing. The challenge will be in the actual code contributed and how complex the codebase is to navigate and understand. If the only way to navigate it is the website and a small amount of custom tools built for it, it's a far inferior experience compared to the vibrant open source programming ecosystem that has existed for decades.
Furthermore, there's nothing stopping this from being implemented as a registration-free web-based code editing platform backed by git or another proven source code version control system. Developers that are knowledgeable of version control systems could check out the entire codebase and commit directly to the git repo. Folks who aren't familiar with that could edit functions/files directly on the website. What github has done in that respect with on-site editing can be made even simpler if you settle on a simpler git workflow than pull requests. The experience can be greatly streamlined if the common case for casual contributors is single function editing. In previous responses it was said that a git version of this would require registration. It doesn't, it's entirely up to you how open or closed a code repository is. And any fear you might have about how you would manage a completely open code repository should be the same as a completely open wiki. As long as you don't require registration, you're dealing with exactly the same vandalism challenges.
A lot of the innovation this project would bring would have a much bigger impact if done as a git-backed project. Tools developed for it would be more likely to be picked up by others who want to do wiki-style open source programming. Something completely custom built on top of MediaWiki would leave us in our existing MediaWiki bubble. We know how limited the open source community around MediaWiki is, and this would likely only attract a subset of it.
Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.
That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly. But even that would have to be pitted against, say, a single-project registration-free custom version of Gitlab, for example. Because code collaboration tools already exist. Building this on top of MediaWiki is going to introduce a ton of reinventing the wheel around code display, code review, CI, etc. Things that other projects have done well already. It could be a lot cheaper to build on top of an existing code collaboration platform.
On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić dvrandecic@wikimedia.org wrote:
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".
There have been a number of other topics, raised in the thread, that I also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.
Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests needs
to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype, for
fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
> > I keep being confused about this point: What are the "functions" on
AW/Wikilambda, and in which language will they be written?
I think everybody has a slightly different perspective on this. I've
been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
I think right now this may be a little more complex than is actually
necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
ZObjects for "implementations" of functions contain the actual code,
along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
Arthur _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amirouche ~ https://hyper.dev
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Yes, Gerard, those are great predecessors! Indeed, I want both of these approaches to be implementable through the new system, but in a way that the community can take more ownership of the process. But yes, I expect both of their approaches to be models for how we can gain some good and early coverage and experiment.
Thanks!
On Sat, Jul 18, 2020 at 1:40 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, When the objective is to create a community, you may want to let a thousand flowers bloom.
Wikilambda does not come out of nothing. There have been multiple comparable projects in the past and all of them have good and bad points. For me Wikilambda is to be a next step that gets us where we want to end up; a place where articles in the Wikipedia style are generated for any and all language. When the objective is to create a community, the one thing that will get us a community is when we provide a service make a difference.
I want us to consider two approaches, automated description by Magnus Manske and the Cebuano Wikipedia project by Sverker Johansson. When Wikilambda is able to improve on the two underlying functionalities, it will generate a buzz that enables the whole ecosystem that is required for the full Wikilambda.
==Automated descriptions== Automated descriptions help in the disambiguation of items. They can be triggered to be used in the search results of projects, they are used in Reasonator [1]. They function after a fashion in any language. The challenge would be to take over this functionality and improve it. Make it use proper grammar and inflections in any language. With improvements underway, we can extend the functionality and use it in combination with Special:MediaSearch [2], a search front end for Commons that enables search for images in any language.
By seeking early results that make a marked difference in our projects, we get a community interested in adding labels, properties and consider the extended functionality.
===Cebuano Wikipedia=== Cebuano Wikipedia is quite a contentious project but Wikilambda aims at achieving exactly the same objectives. There are two parts we can easily do. We can use Wikidata for its source. When the underlying data changes it is easy to re-generate text. The data is used in several Wikipedias. It is therefore not only about Cebuano but also about Waray Waray and Swedish. One objective is to replace the underlying technology and make it Wikilambda native. This will result in the ability to generate articles in other languages.
When the approach of creating articles is morphed in a way that is appropriate for Wikilambda, it will allow all kinds of experiments. Among them are texts that are cached and not saved as articles. Among them other categories of articles.
The main thing is that this project aims to make a practical difference. Thanks, GerardM
[1] https://reasonator.toolforge.org/?find=%D8%AF%D8%A7%D9%82%D9%84%D8%A7%D8%B3+... [2] https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=%D8...
Thanks, GerardM
On Thu, 16 Jul 2020 at 19:50, Denny Vrandečić dvrandecic@wikimedia.org wrote:
No need to apologize, and please be stubborn. I hope that such discussions frequently boil down to "I haven't explained that well enough" instead of "we're gonna do what we gonna do because I said so", so asking these questions ensures it's the former and dispels notions of the latter.
And I am sure there are possible elements of Maslow's hammer [1] and potentially a bit of hubris on my side in play, so I really want to make sure I listen, respond, and explain myself better.
One minor point: "it is entirely unproven that doing this as a wiki would bring more contributors." That's true. That's true for everything new you try. I am not sure how to respond to that, besides saying, yes, that's one of the things we're figuring out. We're having a small team and are exploring that space.
But let's take a step back. Whether we use git or C++ or MediaWiki or Eclipse is, in the end, a question of the implementation. Let's leave the question of *how* aside for a moment, and see whether we have agreement on *what*. What are we trying to achieve with the first part of the project? And by the first part I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.
So my first question is, do we agree on that goal?
(And, maybe the real question is, do we even have a reasonably shared understanding of what that goal means? I am unsure about that - having worked on that for such a long time, I probably make too many assumptions that this is actually clear, but it could be that the first task is to clear that up and create a more shared understanding of that goal - please, let me know!)
Thanks! Denny
On Thu, Jul 16, 2020 at 1:53 AM Gilles Dubuc gilles@wikimedia.org wrote:
Sorry if I sound stubborn on this topic, but all I'm getting from your proposal and this response is "we're going to do X", not a valid comparison to the alternative of more traditional open source projects. There should be a methodic and objective comparison, not subjective statements such as the fact that it may lower the barrier to entry. It's entirely unproven that doing this as a wiki would bring more contributors. In fact folks who are the most knowledgeable about writing code might find it odd and unpleasant to work with, with all the function definitions scattered across web pages. You would have to open many browser tabs to even comprehend that codebase, which is clearly an inferior experience compared to traditional open source, where contributors would have access to a vast amount of software of their liking to view, edit and contribute to a large codebase. And once this codebase, because that's what it is, reaches enough complexity, how you commit changes isn't going to be what stops people from contributing. The challenge will be in the actual code contributed and how complex the codebase is to navigate and understand. If the only way to navigate it is the website and a small amount of custom tools built for it, it's a far inferior experience compared to the vibrant open source programming ecosystem that has existed for decades.
Furthermore, there's nothing stopping this from being implemented as a registration-free web-based code editing platform backed by git or another proven source code version control system. Developers that are knowledgeable of version control systems could check out the entire codebase and commit directly to the git repo. Folks who aren't familiar with that could edit functions/files directly on the website. What github has done in that respect with on-site editing can be made even simpler if you settle on a simpler git workflow than pull requests. The experience can be greatly streamlined if the common case for casual contributors is single function editing. In previous responses it was said that a git version of this would require registration. It doesn't, it's entirely up to you how open or closed a code repository is. And any fear you might have about how you would manage a completely open code repository should be the same as a completely open wiki. As long as you don't require registration, you're dealing with exactly the same vandalism challenges.
A lot of the innovation this project would bring would have a much bigger impact if done as a git-backed project. Tools developed for it would be more likely to be picked up by others who want to do wiki-style open source programming. Something completely custom built on top of MediaWiki would leave us in our existing MediaWiki bubble. We know how limited the open source community around MediaWiki is, and this would likely only attract a subset of it.
Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.
That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly. But even that would have to be pitted against, say, a single-project registration-free custom version of Gitlab, for example. Because code collaboration tools already exist. Building this on top of MediaWiki is going to introduce a ton of reinventing the wheel around code display, code review, CI, etc. Things that other projects have done well already. It could be a lot cheaper to build on top of an existing code collaboration platform.
On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić < dvrandecic@wikimedia.org> wrote:
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".
There have been a number of other topics, raised in the thread, that I also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.
Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki < amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a écrit :
The part I don't get is why a collection of functions and tests
needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing wikimedia codebase. Tho, I believe it is possible to build an integrated development environment that fuses programming language with a version control system. That is what unison is doing with content-addressable code. Unison has different goals, but they have i18n identifiers/docstrings on their roadmap.
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.
My understanding is that wikilambda wants to be easier than going through all the necessary knowledge to contribute to a software project. That is, wikilambda could be something better than git/github.
I get why such an architecture might be a choice for a prototype,
for fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.
My understanding is that the prototype wanted to demonstrate that it is possible to i18n the identifiers, docstrings, etc... of a programming language. Something that nobody tried or succeed at doing so far.
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function in a file, every other place where you call that function, you need to update the code. That is different from a content-addressable code where you reference the content of the function, if you change the function, it creates a new function (possibly with the same name) but old code keeps pointing to the old definition. That much more scalable than the current git/github approach (even if you might keep calling old code).
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
> > On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
>> >> I keep being confused about this point: What are the "functions"
on AW/Wikilambda, and in which language will they be written?
> > > I think everybody has a slightly different perspective on this.
I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
> > I think right now this may be a little more complex than is
actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
> > ZObjects for "implementations" of functions contain the actual
code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
> > Arthur > _______________________________________________ > Abstract-Wikipedia mailing list > Abstract-Wikipedia@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amirouche ~ https://hyper.dev
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת Denny Vrandečić < dvrandecic@wikimedia.org>:
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.
So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.
There is a closely related discussion on Meta about this: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia#git
If I try to reconstruct the history of how so many templates, user scripts, gadgets, and modules, ended up being stored as wiki pages in the first place, it's something like this: Templates started in 2003 or 2004, and were very simple initially. Plain wikitext, very basic parameters support, no parser functions, no documentation pages, and no TemplateData. No one treated templates as code that should be managed with proper design and software engineering practices. User scripts started in 2006 or so, when there was almost no JavaScript on Wikipedia, and much less JavaScript on the web in general. Storing stuff on wiki pages was easy and natural. Their versioning was pretty much as good as CVS and SVN; Git was very young and ultra-nerdy. Gadgets started in 2007 as a way to package user scripts *very slightly* better. Modules started in 2012 as a way to make something similar to templates, but with better performance and more maintainable syntax. So their programming language was different from how templates are written, but their packaging was almost the same.
PLEASE CORRECT ME IF I'M WRONG ABOUT ANYTHING ABOVE.
Basically, the reason they are stored and versioned the way they are is that we are essentially preserving the way things were in 2006. It may sound like I'm saying that it's bad, but it's really not so simple.
The advantages of storing code as wiki pages are, at least for the editors: 1. They are familiar to all wiki editors, at least as a concept. Not everyone knows the *syntax* of JS, Lua, or template parameters and parser functions, but, without exception, everyone who edits wikis is familiar with the general editing and versioning interface. 2. There are three easy steps to deploy code changes: Step one: Edit. Step two: Publish. Step three: There's no step three.[1] 3. They are controlled directly by the editors community and not by people on some other website (Gerrit, Phabricator, GitHub), or organization (WMF, WMDE).
The disadvantages of wiki pages are: 1. Templates, modules, and gadgets are code, even though they very often aren't treated as code. And code, these days, is *expected* to have proper versioning, review, and deployment practices. "Proper versioning" usually means Git. Not in 2006, and not even in 2012, but definitely in 2020. 2. It's easy to break things without proper review.
Disadvantage #1 sounds relevant to pretty much all other people on Earth who write code, but... not so much to editors of Wikimedia wikis. Git is now much more common and much less nerdy than it was in 2006, but that's not necessarily what wiki editors think. So this disadvantage is outweighed by the three advantages.
Disadvantage #2 is easily outweighed by the general wiki principle: what is easy to break is also easy to fix. In fact, with our current deployment tools (Gerrit and zuul and scap and all that), undeploying a Gerrit change takes much more time than reverting a wiki page.
Nevertheless, times did change since 2006, so *considering* a modern versioning system is a valid idea. But it would have to be a versioning system that doesn't give up on the advantages. Otherwise, the community won't use it.
* * * * * * * [1] The source for this joke is an old Apple ad: https://www.youtube.com/watch?v=YHzM4avGrKI . Unlike Apple ads, what I write about wiki pages is actually true.
On 16 July 2020 at 09:35 "Amir E. Aharoni" amir.aharoni@mail.huji.ac.il wrote:
Git is now much more common and much less nerdy than it was in 2006, but that's not necessarily what wiki editors think.
Correct. I had to look at it in January, and I'd have to set aside time to know what it was.
Time to cite, in relation to "the simplest thing that actually works",
https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good
On "barriers to entry", no one knows all about those. But see above.
Charles
בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת Denny Vrandečić < dvrandecic@wikimedia.org>:
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
But you see, that's exactly the problem: This may be clear to people who learned about side effects, composition, and referential transparency in the university for their Computer Science or Mathematics degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.
If the project only wants contributors with CS degrees or comparable knowledge, then it's fine to do it completely differently from a wiki, and more like a software development project: with Git, with rigorous code review, and so on. Maybe the eventual result will be better if it's done this way. But don't expect the size and the diversity of the Wikimedia community, and don't expect that it will be easy to integrate the functions' output into Wikipedia. Maybe it will _possible_, but it won't be easy.
But if the intention is to let the large and diverse Wikimedia community edit it, then from the outset, this project needs to speak in terms that this large community understands. This community understands the following things: * what is an encyclopedia, and what are notability, reliability, original research, etc. * wiki pages and wikitext (links, headings, categories, references, files) * a relatively large part of this community understands structured data in the form of Wikidata * a relatively large part of this community understands templates (at least how to add them to pages, though not necessarily how to create them) * a smaller, but still relatively large and multi-lingual group understands Scribunto modules
It's totally fine to introduce new concepts or to upgrade existing ones, but it has to start from something familiar. Jumping straight into composition, side effects, and Z objects is too cryptic.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful.
Sorry, but I still don't understand how is this different from making the following Scribunto module:
local p = {}
function p.concat(frame) return frame.args[1] .. frame.args[2] end
return p
I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md , and at most, I can see that the Z objects are a programming-language-independent abstract representation of the code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different from a Scribunto module?
Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
But Scribunto can also support multiple programming languages. In practice it supports only Lua at the moment, but in theory more languages like JavaScript and Python could be added. (It's an old and pretty popular community wish.)
So, is a conceptually new infrastructure really needed, and are functions really different from modules? "Scribunto is not scalable for such an ambitious project" would be an acceptable answer, but I'd love to know whether this was at least considered.
Hi all,
The largest issue I see with Wikilambda is conceptual: sure a Wiki of functions would be nice. However, a real code base is not made of pure, generic functions such as "addition" or "concatenation". Let's have a look at a function I just picked from one of my open text editors. Its signature (C#) is: void WriteSignedJsonFile(string path, string content, string publicKey, string signature)
This create a JSON file on the filesystem with a very specific format and values. This function is 100% useless in any other project. The thing is, most code is glue between functions, and most have quite specialized use. In this regard, a Wiki a functions doesn't sound super useful because: 1. For a generic task (say parsing XML) a specialized library in the language one is programming in will be more easy to use 2. The core code as well as glue code of a project is not transferable to other projects
Abstract Wikipedia will need very specialized data structures. I will need very specialized code too. The reusability of this code in unrelated projects will tend towards zero.
Best regards, Louis Lecailliez ________________________________ De : Abstract-Wikipedia abstract-wikipedia-bounces@lists.wikimedia.org de la part de Amir E. Aharoni amir.aharoni@mail.huji.ac.il Envoyé : jeudi 16 juillet 2020 12:17 À : General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org Objet : Re: [Abstract-wikipedia] Wiki of functions (was Re: Runtime considerations: Introducing GraalEneyj)
בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת Denny Vrandečić <dvrandecic@wikimedia.orgmailto:dvrandecic@wikimedia.org>: Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
But you see, that's exactly the problem: This may be clear to people who learned about side effects, composition, and referential transparency in the university for their Computer Science or Mathematics degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.
If the project only wants contributors with CS degrees or comparable knowledge, then it's fine to do it completely differently from a wiki, and more like a software development project: with Git, with rigorous code review, and so on. Maybe the eventual result will be better if it's done this way. But don't expect the size and the diversity of the Wikimedia community, and don't expect that it will be easy to integrate the functions' output into Wikipedia. Maybe it will _possible_, but it won't be easy.
But if the intention is to let the large and diverse Wikimedia community edit it, then from the outset, this project needs to speak in terms that this large community understands. This community understands the following things: * what is an encyclopedia, and what are notability, reliability, original research, etc. * wiki pages and wikitext (links, headings, categories, references, files) * a relatively large part of this community understands structured data in the form of Wikidata * a relatively large part of this community understands templates (at least how to add them to pages, though not necessarily how to create them) * a smaller, but still relatively large and multi-lingual group understands Scribunto modules
It's totally fine to introduce new concepts or to upgrade existing ones, but it has to start from something familiar. Jumping straight into composition, side effects, and Z objects is too cryptic.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful.
Sorry, but I still don't understand how is this different from making the following Scribunto module:
local p = {}
function p.concat(frame) return frame.args[1] .. frame.args[2] end
return p
I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md , and at most, I can see that the Z objects are a programming-language-independent abstract representation of the code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different from a Scribunto module?
Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
But Scribunto can also support multiple programming languages. In practice it supports only Lua at the moment, but in theory more languages like JavaScript and Python could be added. (It's an old and pretty popular community wish.)
So, is a conceptually new infrastructure really needed, and are functions really different from modules? "Scribunto is not scalable for such an ambitious project" would be an acceptable answer, but I'd love to know whether this was at least considered.
That is very true, Louis!
But remember, the goal is not to write arbitrary applications. In order to write an actual full-fledged application there is still a lot of packaging to do: UX, state, persistence, etc. -- all things we will not look at.
Does this limit us in what we can achieve? Yes, absolutely. But our primary goal is not to provide an application building and hosting platform. Our primary goal is to provide a system to extend the coverage in the Wikipedias. For this, we will also provide the missing pieces of product development. But for using the functions of Wikilambda in other contexts, well, for that there will still be more needed.
Wikilambda won't replace apps, and does not intend to. But having a collaborative repository of functions will still provide a lot of interesting use cases, even with this restriction.
But thanks for clarifying the limits of the current plan, this is super helpful! Denny
On Sat, Jul 18, 2020 at 8:31 AM Louis Lecailliez < louis.lecailliez@outlook.fr> wrote:
Hi all,
The largest issue I see with Wikilambda is conceptual: sure a Wiki of functions would be nice. However, a real code base is not made of pure, generic functions such as "addition" or "concatenation". Let's have a look at a function I just picked from one of my open text editors. Its signature (C#) is: void WriteSignedJsonFile(string path, string content, string publicKey, string signature)
This create a JSON file on the filesystem with a very specific format and values. This function is 100% useless in any other project. The thing is, most code is glue between functions, and most have quite specialized use. In this regard, a Wiki a functions doesn't sound super useful because:
- For a generic task (say parsing XML) a specialized library in the
language one is programming in will be more easy to use 2. The core code as well as glue code of a project is not transferable to other projects
Abstract Wikipedia will need very specialized data structures. I will need very specialized code too. The reusability of this code in unrelated projects will tend towards zero.
Best regards, Louis Lecailliez
*De :* Abstract-Wikipedia abstract-wikipedia-bounces@lists.wikimedia.org de la part de Amir E. Aharoni amir.aharoni@mail.huji.ac.il *Envoyé :* jeudi 16 juillet 2020 12:17 *À :* General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org *Objet :* Re: [Abstract-wikipedia] Wiki of functions (was Re: Runtime considerations: Introducing GraalEneyj)
בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת Denny Vrandečić < dvrandecic@wikimedia.org>:
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
But you see, that's exactly the problem: This may be clear to people who learned about side effects, composition, and referential transparency in the university for their Computer Science or Mathematics degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.
If the project only wants contributors with CS degrees or comparable knowledge, then it's fine to do it completely differently from a wiki, and more like a software development project: with Git, with rigorous code review, and so on. Maybe the eventual result will be better if it's done this way. But don't expect the size and the diversity of the Wikimedia community, and don't expect that it will be easy to integrate the functions' output into Wikipedia. Maybe it will _possible_, but it won't be easy.
But if the intention is to let the large and diverse Wikimedia community edit it, then from the outset, this project needs to speak in terms that this large community understands. This community understands the following things:
- what is an encyclopedia, and what are notability, reliability, original
research, etc.
- wiki pages and wikitext (links, headings, categories, references, files)
- a relatively large part of this community understands structured data in
the form of Wikidata
- a relatively large part of this community understands templates (at
least how to add them to pages, though not necessarily how to create them)
- a smaller, but still relatively large and multi-lingual group
understands Scribunto modules
It's totally fine to introduce new concepts or to upgrade existing ones, but it has to start from something familiar. Jumping straight into composition, side effects, and Z objects is too cryptic.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful.
Sorry, but I still don't understand how is this different from making the following Scribunto module:
local p = {}
function p.concat(frame) return frame.args[1] .. frame.args[2] end
return p
I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md , and at most, I can see that the Z objects are a programming-language-independent abstract representation of the code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different from a Scribunto module?
Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
But Scribunto can also support multiple programming languages. In practice it supports only Lua at the moment, but in theory more languages like JavaScript and Python could be added. (It's an old and pretty popular community wish.)
So, is a conceptually new infrastructure really needed, and are functions really different from modules? "Scribunto is not scalable for such an ambitious project" would be an acceptable answer, but I'd love to know whether this was at least considered.
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Hi Amir,
But you see, that's exactly the problem: This may be clear to people who
learned about side effects,
composition, and referential transparency in the university for their
Computer Science or Mathematics
degrees, but I strongly suspect that most editors of Wikimedia projects
didn't learn about these things.
I agree. In the end, when the Wikimedia editors end up using this system, it must be accessible. Also, building the functions in the new project aims to be as accessible as possible. At that point, we won't be talking about side effects and referential transparency. At that point, we will need to be talking in terms that are accessible to as many as possible.
We're not there yet.
In order to get there, I think it makes good sense to understand the state of the art in programming language research, and to distill what the common issues are hampering the democratization of coding. In order to talk about these I sometimes slip into jargon, and I want to apologize for that. That's similar to when Wikidata was developed, and sometimes we were talking about ontologies and inverse functional properties and description logics. But I very much hope that most of this remains well encapsulated away from the contributors to Wikidata and even more so from the contributors and readers of Wikipedia.
So, yes, please call me out when I use jargon. I will try to be more mindful about that.
I've read through the walkthrough at
https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md ,
and at most, I can see that the Z objects are a
programming-language-independent abstract representation of the
code's interface. But _eventually_, there is code at the bottom, isn't
there? And is that code fundamentally different
from a Scribunto module?
Yeah. Mostly. (Trying not to point out that actually no code at the bottom is really required, because that is an irrelevant tangent to this conversation.)
(And failing at not pointing it out. Sorry.)
Yes, in many ways it is very much alike to a Scribunto module.
We'll definitely take a very close look at Scribunto and see how much of that will make sense to reuse. The sandboxing experience we gained in ops and development, the fact that it is already a well used system that had years of real life exposure, the skills the community has already acquired, that's nothing that should be dismissed lightly.
We want to ensure that the system understands the implemented functionality sufficiently to expose comfortable interfaces to it that can be used by a wide variety of users. And also to be able to easily and safely reuse the functionality in other functions.
I hope that makes sense.
Thanks, Denny
On Thu, Jul 16, 2020 at 5:18 AM Amir E. Aharoni < amir.aharoni@mail.huji.ac.il> wrote:
בתאריך יום ה׳, 16 ביולי 2020 ב-6:23 מאת Denny Vrandečić < dvrandecic@wikimedia.org>:
Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.
But you see, that's exactly the problem: This may be clear to people who learned about side effects, composition, and referential transparency in the university for their Computer Science or Mathematics degrees, but I strongly suspect that most editors of Wikimedia projects didn't learn about these things.
If the project only wants contributors with CS degrees or comparable knowledge, then it's fine to do it completely differently from a wiki, and more like a software development project: with Git, with rigorous code review, and so on. Maybe the eventual result will be better if it's done this way. But don't expect the size and the diversity of the Wikimedia community, and don't expect that it will be easy to integrate the functions' output into Wikipedia. Maybe it will _possible_, but it won't be easy.
But if the intention is to let the large and diverse Wikimedia community edit it, then from the outset, this project needs to speak in terms that this large community understands. This community understands the following things:
- what is an encyclopedia, and what are notability, reliability, original
research, etc.
- wiki pages and wikitext (links, headings, categories, references, files)
- a relatively large part of this community understands structured data in
the form of Wikidata
- a relatively large part of this community understands templates (at
least how to add them to pages, though not necessarily how to create them)
- a smaller, but still relatively large and multi-lingual group
understands Scribunto modules
It's totally fine to introduce new concepts or to upgrade existing ones, but it has to start from something familiar. Jumping straight into composition, side effects, and Z objects is too cryptic.
So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful.
Sorry, but I still don't understand how is this different from making the following Scribunto module:
local p = {}
function p.concat(frame) return frame.args[1] .. frame.args[2] end
return p
I've read through the walkthrough at https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md , and at most, I can see that the Z objects are a programming-language-independent abstract representation of the code's interface. But _eventually_, there is code at the bottom, isn't there? And is that code fundamentally different from a Scribunto module?
Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.
But Scribunto can also support multiple programming languages. In practice it supports only Lua at the moment, but in theory more languages like JavaScript and Python could be added. (It's an old and pretty popular community wish.)
So, is a conceptually new infrastructure really needed, and are functions really different from modules? "Scribunto is not scalable for such an ambitious project" would be an acceptable answer, but I'd love to know whether this was at least considered.
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
This is beautiful Lucas, I love that there is already a Graal implementation :D
On Tue, Jul 14, 2020 at 5:38 PM Lucas Werkmeister mail@lucaswerkmeister.de wrote:
Hi folks!
Since people have started talking about runtimes for the functions defined in Wikilambda (see: V8 Engine andor WebAssembly thread), I figured I should mention the project I’ve been working on for a few weeks: GraalEneyj.[1] It’s supposed to be(come) a GraalVM-based (re)implementation of the eneyj language/runtime[2] that Denny wrote and which is currently part of the AbstractText extension.
GraalVM,[3] for those who haven’t heard of it yet, is a Java-based project to build high-performance language implementations. There are already GraalVM-based implementations of JavaScript,[4] R,[5] Ruby,[6] LLVM,[7] WebAssembly,[8] and more,[9] and code snippets in any of these languages can run in the same (J)VM ((Java) Virtual Machine) and call each other with no performance penalty. This is of course interesting if we’re thinking about Wikilambda holding function implementations in different languages – with GraalEneyj, it should be possible to seamlessly compose functions implemented in any language for which a GraalVM implementation is available. GraalVM is also supposed to deliver highly optimized language implementations with relatively little development effort, but so far GraalEneyj is not yet at a point where I could benchmark it.
If that sounds interesting to you, please get in touch! You can also explore the code that I have so far (there’s not a ton of documentation yet, but hopefully the README[10] is at least enough to get a basic idea and run the tests), or watch the GraalEneyj walkthrough[11] that I recorded last month. (At some point I should record another one, since the code keeps changing, but for now that’s all I have to offer.)
Cheers, Lucas Werkmeister
PS: In my day job I’m a software developer at Wikimedia Deutschland e. V., but all of this is private-time activity so far, not work related.
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
abstract-wikipedia@lists.wikimedia.org