Re: [Abstract-wikipedia] Wiki of functions (was Re: Runtime considerations: Introducing GraalEneyj)

17 Jul 2020


      No need to apologize, and please be stubborn. I hope that such discussions
frequently boil down to "I haven't explained that well enough" instead of
"we're gonna do what we gonna do because I said so", so asking these
questions ensures it's the former and dispels notions of the latter.
And I am sure there are possible elements of Maslow's hammer [1] and
potentially a bit of hubris on my side in play, so I really want to make
sure I listen, respond, and explain myself better.
One minor point: "it is entirely unproven that doing this as a wiki would
bring more contributors." That's true. That's true for everything new you
try. I am not sure how to respond to that, besides saying, yes, that's one
of the things we're figuring out. We're having a small team and are
exploring that space.
But let's take a step back. Whether we use git or C++ or MediaWiki or
Eclipse is, in the end, a question of the implementation. Let's leave the
question of *how* aside for a moment, and see whether we have agreement on
*what*.
What are we trying to achieve with the first part of the project? And by
the first part I mean Wikilambda. The goal is to create a community project
to develop a comprehensive repository of functions, a Wikipedia but for
functions, so this first part is detached from any questions about natural
language generation and knowledge representation.
So my first question is, do we agree on that goal?
(And, maybe the real question is, do we even have a reasonably shared
understanding of what that goal means? I am unsure about that - having
worked on that for such a long time, I probably make too many assumptions
that this is actually clear, but it could be that the first task is to
clear that up and create a more shared understanding of that goal - please,
let me know!)
Thanks!
Denny
On Thu, Jul 16, 2020 at 1:53 AM Gilles Dubuc gilles@wikimedia.org wrote:
...
Sorry if I sound stubborn on this topic, but all I'm getting from your
proposal and this response is "we're going to do X", not a valid comparison
to the alternative of more traditional open source projects. There should
be a methodic and objective comparison, not subjective statements such as
the fact that it may lower the barrier to entry. It's entirely unproven
that doing this as a wiki would bring more contributors. In fact folks who
are the most knowledgeable about writing code might find it odd and
unpleasant to work with, with all the function definitions scattered across
web pages. You would have to open many browser tabs to even comprehend that
codebase, which is clearly an inferior experience compared to traditional
open source, where contributors would have access to a vast amount of
software of their liking to view, edit and contribute to a large codebase.
And once this codebase, because that's what it is, reaches enough
complexity, how you commit changes isn't going to be what stops people from
contributing. The challenge will be in the actual code contributed and how
complex the codebase is to navigate and understand. If the only way to
navigate it is the website and a small amount of custom tools built for it,
it's a far inferior experience compared to the vibrant open source
programming ecosystem that has existed for decades.
Furthermore, there's nothing stopping this from being implemented as a
registration-free web-based code editing platform backed by git or another
proven source code version control system. Developers that are
knowledgeable of version control systems could check out the entire
codebase and commit directly to the git repo. Folks who aren't familiar
with that could edit functions/files directly on the website. What github
has done in that respect with on-site editing can be made even simpler if
you settle on a simpler git workflow than pull requests. The experience can
be greatly streamlined if the common case for casual contributors is single
function editing. In previous responses it was said that a git version of
this would require registration. It doesn't, it's entirely up to you how
open or closed a code repository is. And any fear you might have about how
you would manage a completely open code repository should be the same as a
completely open wiki.  As long as you don't require registration, you're
dealing with exactly the same vandalism challenges.
A lot of the innovation this project would bring would have a much bigger
impact if done as a git-backed project. Tools developed for it would be
more likely to be picked up by others who want to do wiki-style open source
programming. Something completely custom built on top of MediaWiki would
leave us in our existing MediaWiki bubble. We know how limited the open
source community around MediaWiki is, and this would likely only attract a
subset of it.
Overall I get the sense that the reason this is proposed to be a wiki
isn't that a wiki brings real benefits to the problem being solved compared
to alternatives, it's that the people involved have been working on wikis,
and especially MediaWiki wikis, for a long time. It's a collective hammer
and as a result everything looks like nails. Building things on top of
MediaWiki isn't the only way to make a successful collaborative project,
particularly one that is in the programming space.
That being said, the two aren't mutually exclusive. It can be both a
MediaWiki wiki and a git repo if bridged correctly. But even that would
have to be pitted against, say, a single-project registration-free custom
version of Gitlab, for example. Because code collaboration tools already
exist. Building this on top of MediaWiki is going to introduce a ton of
reinventing the wheel around code display, code review, CI, etc. Things
that other projects have done well already. It could be a lot cheaper to
build on top of an existing code collaboration platform.
On Thu, Jul 16, 2020 at 5:23 AM Denny Vrandečić dvrandecic@wikimedia.org
wrote:
...
Thanks for the great discussion!
Renaming the topic as it deviated a bit from Lucas' original
announcement, and was questioning whether we should do Wikilambda at all.
So, this is mostly to answer Gilles' comment as to why a collection of
functions and tests needs to be editable on a wiki - that's the premise of
the first part of the project.
So yes, we could maybe create the whole code for the renderers in a
classical way, as an Open Source project, going through git and gerrit, and
assume that we will have hundreds of new coders working on this, neatly
covering all supported languages, to keep up with the creation of renderers
when the community creates new constructors, etc. I have doubts. I want to
drastically reduce the barrier to participate in this task, and I think
that this is, in fact, necessary. And this is why we are going for
Wikilambda, a wiki of functions first, as said in my first email to the
list.
Amir, when I say function I mean a computation that accepts an input and
returns an output. That differs, for example, from Lua modules, which are
basically a small library of functions and helpers. We're also starting
with no side-effects and referential transparency, because we're boring,
and leave that for a bit later.
So, there will be wikipages that represent such a function. So on one
page we would say, "so, there's this function, in English it is called
concatenate, it takes two strings and returns one, and it does this".
Admittedly, not a very complicated function, but quite useful. Then we
might have several implementations for this function, for example in
different programming languages, we also might have tests for this
function. All implementations should always behave exactly the same. An
evaluation engine decides which implementations it wants to use (and, for
the result it shouldn't matter because they all should behave the same),
but some engines might prefer to run the JavaScript implementations, others
might run Python, others might decide "oh, wait, I am composing two
different functions together in order to get to a result, maybe it is
smarter if I pick two implementations with the same language, as I actually
can compile that into a single call beforehand", etc. There are plenty of
possibilities to improve and optimize evaluation engines, and if you were
asking me to dream I'd say I am seeing a vibrant ecosystem of evaluation
engines from different providers, running in vastly different contexts,
from the local browser to an open source peer-to-peer distributed computing
project.
What looks like a programming language of its own in the existing
ZObjects (but isn't) is the ability to compose functions. So, say, you want
to now introduce a double_string function which takes one strings and
returns one, you could either implement it in any of the individual
programming languages, or you can implement it as a composition, i.e.
double_string(arg1) := concatenate(arg1, arg1). Since you can have more
implementations, you can still go ahead and write a native implementation
in JavaScript or Python or WebAssembler, but because we know how this
function is composed from existing functions, the evaluation engine can do
that for you as well. (That's an idea I want to steal for the natural
language generation engine later too, but this time for natural languages)
By exposing these functions, the goal is to provide a library of such
functions that anyone can extend, maintain, browse, execute, and use for
any purpose. Contributors and readers can come in and compare different
implementations of the same function. They can call functions and evaluate
them in a local evaluation engine. Or in the browser (iv supported). Or on
the server. That's the goal of the wiki of functions, what we call
Wikilambda in the proposal - a new Wikimedia project that puts functions
front and center, and makes it a new form of knowledge asset that the
Wikimedia movement aims to democratize and make accessible.
One goal is to vastly lower the barrier to contribute the renderers that
we will need for the second part of the project. But to also make it clear:
supporting the rendering of content and the creation of content for the
Wikipedias is our first priority, but it does not end there. Just as
Wikidata and Commons have the goal to support the Wikipedias (and within
these communities, the prioritization of this goal compared to other goals
may differ substantially from one contributor to the next), Wikilambda will
have that as a primary goal. But we would be selling ourselves short if we
stopped there. There are a number of interesting use cases that will be
enabled by this project, many of which are aligned with our vision of a
world in which everyone can share in the sum of all knowledge - and a
number of the 2030 strategic goals.
You may think, that is tangential, unnecessary, overly complicated - but
when discussing the overall goal of providing content in many languages
with a number of senior researchers in this area, it was exactly this
tangent that, frequently, made them switch from "this is impossible" to
"this might work, perhaps".
There have been a number of other topics, raised in the thread, that I
also want to addess.
Regarding atomicity of updates by Gilles, which was answered by Arthur
that, if you're changing the functionality, you should create a new
function (that's similar in spirit to the content-addressable code
mentioned by Amirouche), and that's where we'll start (because I am a
simple person, and I like to start with simple things for the first
iteration). We might identify the need for a more powerful versioning
scheme later, and we'll get to it then. I hope to avoid that.
Will the Lua be the same as in the Scribunto modules?, as asked by Amir.
We'll get to that when we get to implementations in Lua. I think it would
be desirable, and I also think it might be possible, but we'll get to it
later. We won't start with Lua.
As Arthur points out, the AbstractText prototype is a bit more complex
than necessary. It is a great prototype to show that it can work - having
functions implemented in JavaScript and Python and be called to evaluate a
composed function. It is not a great prototype to show how it should work
though, and indeed, as an implementation it is a bit more complex than what
we will initially need.
Regarding comments as asked by Louis, every implementation will come with
the ability to have documentation. Also, the implementations written in
programming languages will have the usual ability to have comments.
Regarding not doing it on MediaWiki, as suggested by Amirouche - well, I
had the same discussions regarding Wikidata and Semantic MediaWiki, and I
still think it was the best possible decision to build them on top of
MediaWiki. The amount of functionality we would need to reimplement
otherwise is so incredibly large, that a much larger team would be needed,
without devoting actually more resources towards solving our novel problems.
(I am soooo sorry for writing such long emails, but I think it is
important in this early stage to overshare.)
On Wed, Jul 15, 2020 at 10:42 AM Amirouche Boubekki <
amirouche.boubekki@gmail.com> wrote:
...
Le mer. 15 juil. 2020 à 17:00, Gilles Dubuc gilles@wikimedia.org a
écrit :
...
The part I don't get is why a collection of functions and tests needs
to be editable on a wiki. This poses severe limitations in terms of
versioning if you start having functions depend on each other and editors
can only edit one at a time. This will inevitably bring breaking
intermediary edits. It seems like reinventing the wheel of code version
control on top of a system that wasn't designed for it. MediaWiki isn't
designed with the ability to have atomic edits that span multiple
pages/items. Which is a pretty common requirement for any large codebase,
which this set of functions sounds like it's posed to become.
I disagree with the fact that it must be done with the existing
wikimedia codebase. Tho, I believe it is possible to build an
integrated development environment that fuses programming language
with a version control system. That is what unison is doing with
content-addressable code. Unison has different goals, but they have
i18n identifiers/docstrings on their roadmap.
...
What reasons does this benefit from being done as wiki editable code
compared to a software project (or series of libraries/services) hosted on
a git repository? Especially if we're talking about extensive programming
and tests, whoever is computer literate enough to master these concepts is
likely to know how to contribute to git projects as well, as they would
have very likely encountered that on their path to learning functional
programming.
My understanding is that wikilambda wants to be easier than going
through all the necessary knowledge to contribute to a software
project. That is, wikilambda could be something better than
git/github.
...
I get why such an architecture might be a choice for a prototype, for
fast iteration and because the people involved are familiar with wikis. But
what core contributors are familiar with and what can be used for a fast
prototype is rarely a suitable and scalable architecture for the final
product.
My understanding is that the prototype wanted to demonstrate that it
is possible to i18n the identifiers, docstrings, etc... of a
programming language. Something that nobody tried or succeed at doing
so far.
...
There is also a notion of how dynamic the content is to take into
account. There is a difference between the needs of ever-changing
encyclopedic information, which a wiki is obviously good at handling, and
an iterative software project where the definition of a function to
pluralize some text in a language is highly unlikely to be edited on a
daily basis, but where interdependencies are much more likely to be an
important feature. Something that code version control is designed for.
The problem with git is that if you change the definition of function
in a file, every other place where you call that function, you need to
update the code. That is different from a content-addressable code
where you reference the content of the function, if you change the
function, it creates a new function (possibly with the same name) but
old code keeps pointing to the old definition. That much more scalable
than the current git/github approach (even if you might keep calling
old code).
...
On Wed, Jul 15, 2020 at 4:42 PM Arthur Smith arthurpsmith@gmail.com
wrote:
...
...
On Wed, Jul 15, 2020 at 8:15 AM Amir E. Aharoni <
amir.aharoni@mail.huji.ac.il> wrote:
...
...
...
I keep being confused about this point: What are the "functions" on
AW/Wikilambda, and in which language will they be written?
...
...
I think everybody has a slightly different perspective on this. I've
been working closely with Denny's prototype project (which included the
javascript version of 'eneyj' that Lucas was inspired by) at
https://github.com/google/abstracttext/ (I assume this will move
somewhere under wikimedia now?). The prototype does to an extent define its
own "language" - specifically it defines (human) language-independent "Z
Objects" (implemented as a subset of json) which encapsulate all kinds of
computing objects: functions, tests, numeric values, strings, languages
(human and programming), etc.
...
...
I think right now this may be a little more complex than is actually
necessary (are strict types necessary? maybe a 'validator' and a 'test'
should be the same thing?); on the other hand something like this is needed
to be able to have a wiki-style community editable library of functions,
renderers, etc. and I really like the underlying approach (similar to
Wikidata) to take all the human-language components off into label
structures that are not the fundamental identifiers.
...
...
ZObjects for "implementations" of functions contain the actual code,
along with a key indicating the programming language of the code. Another
ZObject can call the function in some context which chooses an
implementation to run it. At a basic level this is working, but there's a
lot more to do to get where we want and be actually useful...
...
...
Arthur
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
--
Amirouche ~ https://hyper.dev

Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

2024

2023

2022

2021

2020

Re: [Abstract-wikipedia] Wiki of functions (was Re: Runtime considerations: Introducing GraalEneyj)