Hoi,
When the objective is to create a community, you may want to let a thousand flowers bloom.

Wikilambda does not come out of nothing. There have been multiple comparable projects in the past and all of them have good and bad points. For me Wikilambda is to be a next step that gets us where we want to end up; a place where articles in the Wikipedia style are generated for any and all language. When the objective is to create a community, the one thing that will get us a community is when we provide a service make a difference.

I want us to consider two approaches, automated description by Magnus Manske and the Cebuano Wikipedia project by Sverker Johansson. When Wikilambda is able to improve on the two underlying functionalities, it will generate a buzz that enables the whole ecosystem that is required for the full Wikilambda.

==Automated descriptions==
Automated descriptions help in the disambiguation of items. They can be triggered to be used in the search results of projects, they are used in Reasonator [1]. They function after a fashion in any language. The challenge would be to take over this functionality and improve it. Make it use proper grammar and inflections in any language. With improvements underway, we can extend the functionality and use it in combination with Special:MediaSearch [2], a search front end for Commons that enables search for images in any language.

By seeking early results that make a marked difference in our projects, we get a community interested in adding labels, properties and consider the extended functionality.

===Cebuano Wikipedia===
Cebuano Wikipedia is quite a contentious project but Wikilambda aims at achieving exactly the same objectives. There are two parts we can easily do. We can use Wikidata for its source. When the underlying data changes it is easy to re-generate text. The data is used in several Wikipedias. It is therefore not only about Cebuano but also about Waray Waray and Swedish. One objective is to replace the underlying technology and make it Wikilambda native. This will result in the ability to generate articles in other languages.

When the approach of creating articles is morphed in a way that is appropriate for Wikilambda, it will allow all kinds of experiments. Among them are texts that are cached and not saved as articles. Among them other categories of articles. 

The main thing is that this project aims to make a practical difference.
Thanks,
      GerardM

[1] https://reasonator.toolforge.org/?find=%D8%AF%D8%A7%D9%82%D9%84%D8%A7%D8%B3+%D8%A2%D8%AF%D8%A7%D9%85%D8%B2
[2] https://commons.wikimedia.org/wiki/Special:MediaSearch?type=bitmap&q=%D8%AF%D8%A7%D9%82%D9%84%D8%A7%D8%B3+%D8%A2%D8%AF%D8%A7%D9%85%D8%B2


Thanks,
       GerardM



On Thu, 16 Jul 2020 at 19:50, Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
No need to apologize, and please be stubborn. I hope that such discussions frequently boil down to "I haven't explained that well enough" instead of "we're gonna do what we gonna do because I said so", so asking these questions ensures it's the former and dispels notions of the latter.

And I am sure there are possible elements of Maslow's hammer [1] and potentially a bit of hubris on my side in play, so I really want to make sure I listen, respond, and explain myself better.

One minor point: "it is entirely unproven that doing this as a wiki would bring more contributors." That's true. That's true for everything new you try. I am not sure how to respond to that, besides saying, yes, that's one of the things we're figuring out. We're having a small team and are exploring that space.

But let's take a step back. Whether we use git or C++ or MediaWiki or Eclipse is, in the end, a question of the implementation. Let's leave the question of *how* aside for a moment, and see whether we have agreement on *what*.
What are we trying to achieve with the first part of the project? And by the first part I mean Wikilambda. The goal is to create a community project to develop a comprehensive repository of functions, a Wikipedia but for functions, so this first part is detached from any questions about natural language generation and knowledge representation.

So my first question is, do we agree on that goal?

(And, maybe the real question is, do we even have a reasonably shared understanding of what that goal means? I am unsure about that - having worked on that for such a long time, I probably make too many assumptions that this is actually clear, but it could be that the first task is to clear that up and create a more shared understanding of that goal - please, let me know!)

Thanks!
Denny

On Thu, Jul 16, 2020 at 1:53 AM Gilles Dubuc <gilles@wikimedia.org> wrote:
Sorry if I sound stubborn on this topic, but all I'm getting from your proposal and this response is "we're going to do X", not a valid comparison to the alternative of more traditional open source projects. There should be a methodic and objective comparison, not subjective statements such as the fact that it may lower the barrier to entry. It's entirely unproven that doing this as a wiki would bring more contributors. In fact folks who are the most knowledgeable about writing code might find it odd and unpleasant to work with, with all the function definitions scattered across web pages. You would have to open many browser tabs to even comprehend that codebase, which is clearly an inferior experience compared to traditional open source, where contributors would have access to a vast amount of software of their liking to view, edit and contribute to a large codebase. And once this codebase, because that's what it is, reaches enough complexity, how you commit changes isn't going to be what stops people from contributing. The challenge will be in the actual code contributed and how complex the codebase is to navigate and understand. If the only way to navigate it is the website and a small amount of custom tools built for it, it's a far inferior experience compared to the vibrant open source programming ecosystem that has existed for decades.

Furthermore, there's nothing stopping this from being implemented as a registration-free web-based code editing platform backed by git or another proven source code version control system. Developers that are knowledgeable of version control systems could check out the entire codebase and commit directly to the git repo. Folks who aren't familiar with that could edit functions/files directly on the website. What github has done in that respect with on-site editing can be made even simpler if you settle on a simpler git workflow than pull requests. The experience can be greatly streamlined if the common case for casual contributors is single function editing. In previous responses it was said that a git version of this would require registration. It doesn't, it's entirely up to you how open or closed a code repository is. And any fear you might have about how you would manage a completely open code repository should be the same as a completely open wiki.  As long as you don't require registration, you're dealing with exactly the same vandalism challenges.

A lot of the innovation this project would bring would have a much bigger impact if done as a git-backed project. Tools developed for it would be more likely to be picked up by others who want to do wiki-style open source programming. Something completely custom built on top of MediaWiki would leave us in our existing MediaWiki bubble. We know how limited the open source community around MediaWiki is, and this would likely only attract a subset of it.

Overall I get the sense that the reason this is proposed to be a wiki isn't that a wiki brings real benefits to the problem being solved compared to alternatives, it's that the people involved have been working on wikis, and especially MediaWiki wikis, for a long time. It's a collective hammer and as a result everything looks like nails. Building things on top of MediaWiki isn't the only way to make a successful collaborative project, particularly one that is in the programming space.

That being said, the two aren't mutually exclusive. It can be both a MediaWiki wiki and a git repo if bridged correctly. But even that would have to be pitted against, say, a single-project registration-free custom version of Gitlab, for example. Because code collaboration tools already exist. Building this on top of MediaWiki is going to introduce a ton of reinventing the wheel around code display, code review, CI, etc. Things that other projects have done well already. It could be a lot cheaper to build on top of an existing code collaboration platform.

On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
Thanks for the great discussion!

Renaming the topic as it deviated a bit from Lucas' original announcement, and was questioning whether we should do Wikilambda at all. So, this is mostly to answer Gilles' comment as to why a collection of functions and tests needs to be editable on a wiki - that's the premise of the first part of the project.

So yes, we could maybe create the whole code for the renderers in a classical way, as an Open Source project, going through git and gerrit, and assume that we will have hundreds of new coders working on this, neatly covering all supported languages, to keep up with the creation of renderers when the community creates new constructors, etc. I have doubts. I want to drastically reduce the barrier to participate in this task, and I think that this is, in fact, necessary. And this is why we are going for Wikilambda, a wiki of functions first, as said in my first email to the list.

Amir, when I say function I mean a computation that accepts an input and returns an output. That differs, for example, from Lua modules, which are basically a small library of functions and helpers. We're also starting with no side-effects and referential transparency, because we're boring, and leave that for a bit later.

So, there will be wikipages that represent such a function. So on one page we would say, "so, there's this function, in English it is called concatenate, it takes two strings and returns one, and it does this". Admittedly, not a very complicated function, but quite useful. Then we might have several implementations for this function, for example in different programming languages, we also might have tests for this function. All implementations should always behave exactly the same. An evaluation engine decides which implementations it wants to use (and, for the result it shouldn't matter because they all should behave the same), but some engines might prefer to run the JavaScript implementations, others might run Python, others might decide "oh, wait, I am composing two different functions together in order to get to a result, maybe it is smarter if I pick two implementations with the same language, as I actually can compile that into a single call beforehand", etc. There are plenty of possibilities to improve and optimize evaluation engines, and if you were asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation engines from different providers, running in vastly different contexts, from the local browser to an open source peer-to-peer distributed computing project.

What looks like a programming language of its own in the existing ZObjects (but isn't) is the ability to compose functions. So, say, you want to now introduce a double_string function which takes one strings and returns one, you could either implement it in any of the individual programming languages, or you can implement it as a composition, i.e. double_string(arg1) := concatenate(arg1, arg1). Since you can have more implementations, you can still go ahead and write a native implementation in JavaScript or Python or WebAssembler, but because we know how this function is composed from existing functions, the evaluation engine can do that for you as well. (That's an idea I want to steal for the natural language generation engine later too, but this time for natural languages)

By exposing these functions, the goal is to provide a library of such functions that anyone can extend, maintain, browse, execute, and use for any purpose. Contributors and readers can come in and compare different implementations of the same function. They can call functions and evaluate them in a local evaluation engine. Or in the browser (iv supported). Or on the server. That's the goal of the wiki of functions, what we call Wikilambda in the proposal - a new Wikimedia project that puts functions front and center, and makes it a new form of knowledge asset that the Wikimedia movement aims to democratize and make accessible.

One goal is to vastly lower the barrier to contribute the renderers that we will need for the second part of the project. But to also make it clear: supporting the rendering of content and the creation of content for the Wikipedias is our first priority, but it does not end there. Just as Wikidata and Commons have the goal to support the Wikipedias (and within these communities, the prioritization of this goal compared to other goals may differ substantially from one contributor to the next), Wikilambda will have that as a primary goal. But we would be selling ourselves short if we stopped there. There are a number of interesting use cases that will be enabled by this project, many of which are aligned with our vision of a world in which everyone can share in the sum of all knowledge - and a number of the 2030 strategic goals.

You may think, that is tangential, unnecessary, overly complicated - but when discussing the overall goal of providing content in many languages with a number of senior researchers in this area, it was exactly this tangent that, frequently, made them switch from "this is impossible" to "this might work, perhaps".

There have been a number of other topics, raised in the thread, that I also want to addess.

Regarding atomicity of updates by Gilles, which was answered by Arthur that, if you're changing the functionality, you should create a new function (that's similar in spirit to the content-addressable code mentioned by Amirouche), and that's where we'll start (because I am a simple person, and I like to start with simple things for the first iteration). We might identify the need for a more powerful versioning scheme later, and we'll get to it then. I hope to avoid that.

Will the Lua be the same as in the Scribunto modules?, as asked by Amir. We'll get to that when we get to implementations in Lua. I think it would be desirable, and I also think it might be possible, but we'll get to it later. We won't start with Lua.

As Arthur points out, the AbstractText prototype is a bit more complex than necessary. It is a great prototype to show that it can work - having functions implemented in JavaScript and Python and be called to evaluate a composed function. It is not a great prototype to show how it should work though, and indeed, as an implementation it is a bit more complex than what we will initially need.

Regarding comments as asked by Louis, every implementation will come with the ability to have documentation. Also, the implementations written in programming languages will have the usual ability to have comments.

Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I had the same discussions regarding Wikidata and Semantic MediaWiki, and I still think it was the best possible decision to build them on top of MediaWiki. The amount of functionality we would need to reimplement otherwise is so incredibly large, that a much larger team would be needed, without devoting actually more resources towards solving our novel problems.

(I am soooo sorry for writing such long emails, but I think it is important in this early stage to overshare.)


On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki <amirouche.boubekki@gmail.com> wrote:
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc <gilles@wikimedia.org> a écrit :
>
> The part I don't get is why a collection of functions and tests needs to be editable on a wiki. This poses severe limitations in terms of versioning if you start having functions depend on each other and editors can only edit one at a time. This will inevitably bring breaking intermediary edits. It seems like reinventing the wheel of code version control on top of a system that wasn't designed for it. MediaWiki isn't designed with the ability to have atomic edits that span multiple pages/items. Which is a pretty common requirement for any large codebase, which this set of functions sounds like it's posed to become.

I disagree with the fact that it must be done with the existing
wikimedia codebase. Tho, I believe it is possible to build an
integrated development environment that fuses programming language
with a version control system. That is what unison is doing with
content-addressable code. Unison has different goals, but they have
i18n identifiers/docstrings on their roadmap.

> What reasons does this benefit from being done as wiki editable code compared to a software project (or series of libraries/services) hosted on a git repository? Especially if we're talking about extensive programming and tests, whoever is computer literate enough to master these concepts is likely to know how to contribute to git projects as well, as they would have very likely encountered that on their path to learning functional programming.

My understanding is that wikilambda wants to be easier than going
through all the necessary knowledge to contribute to a software
project. That is, wikilambda could be something better than
git/github.

> I get why such an architecture might be a choice for a prototype, for fast iteration and because the people involved are familiar with wikis. But what core contributors are familiar with and what can be used for a fast prototype is rarely a suitable and scalable architecture for the final product.

My understanding is that the prototype wanted to demonstrate that it
is possible to i18n the identifiers, docstrings, etc... of a
programming language. Something that nobody tried or succeed at doing
so far.

> There is also a notion of how dynamic the content is to take into account. There is a difference between the needs of ever-changing encyclopedic information, which a wiki is obviously good at handling, and an iterative software project where the definition of a function to pluralize some text in a language is highly unlikely to be edited on a daily basis, but where interdependencies are much more likely to be an important feature. Something that code version control is designed for.

The problem with git is that if you change the definition of function
in a file, every other place where you call that function, you need to
update the code. That is different from a content-addressable code
where you reference the content of the function, if you change the
function, it creates a new function (possibly with the same name) but
old code keeps pointing to the old definition. That much more scalable
than the current git/github approach (even if you might keep calling
old code).

> On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith <arthurpsmith@gmail.com> wrote:
>>
>> On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:
>>>
>>> I keep being confused about this point: What are the "functions" on AW/Wikilambda, and in which language will they be written?
>>
>>
>> I think everybody has a slightly different perspective on this. I've been working closely with Denny's prototype project (which included the javascript version of 'eneyj' that Lucas was inspired by) at https://github.com/google/abstracttext/ (I assume this will move somewhere under wikimedia now?). The prototype does to an extent define its own "language" - specifically it defines (human) language-independent "Z Objects" (implemented as a subset of json) which encapsulate all kinds of computing objects: functions, tests, numeric values, strings, languages (human and programming), etc.
>>
>> I think right now this may be a little more complex than is actually necessary (are strict types necessary? maybe a 'validator' and a 'test' should be the same thing?); on the other hand something like this is needed to be able to have a wiki-style community editable library of functions, renderers, etc. and I really like the underlying approach (similar to Wikidata) to take all the human-language components off into label structures that are not the fundamental identifiers.
>>
>> ZObjects for "implementations" of functions contain the actual code, along with a key indicating the programming language of the code. Another ZObject can call the function in some context which chooses an implementation to run it. At a basic level this is working, but there's a lot more to do to get where we want and be actually useful...
>>
>>    Arthur
>> _______________________________________________
>> Abstract-Wikipedia mailing list
>> Abstract-Wikipedia@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
>
> _______________________________________________
> Abstract-Wikipedia mailing list
> Abstract-Wikipedia@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia



--
Amirouche ~ https://hyper.dev

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia