The on-wiki version of this newsletter is here:
When we started the development effort towards the Wikifunctions site, we
sub-divided the work leading up to the launch of Wikifunctions into eleven
phases <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases>, named
after the first eleven letters of the Greek alphabet.
- With Phase α (alpha) completed, it became possible to create instances
of the system-provided Types in the wiki.
- With Phase β (beta), it became possible to create new Types on-wiki
and to create instances of these Types.
- With Phase γ (gamma), all the main Types of the pre-generic function
model were available.
- With Phase δ (delta), it became possible to evaluate built-in
- With Phase ε (epsilon), it became possible to evaluate
contributor-written implementations in any of our supported programming
- This week, we completed Phase ζ (zeta).
The goal of Phase ζ has been to provide the capability to evaluate
implementations composed of other functions.
What does this mean? Every Function in Wikifunctions can have several
Implementations. There are three different ways to express an
1. As a built-in Function, written in the code of Wikilambda: this means
that the Implementation is handled by the evaluator natively using code
written by the team.
2. As code in a programming language, created by the contributors of
Wikifunctions: the Implementation of a Function can be given in any
programming language that Wikifunctions supports. Eventually we aim to
support a large number of programming languages; for now we support
3. As a composition of other Functions: this means that contributors can
use existing Functions as building blocks in order to implement new
With Phase ζ we close the trilogy of Phases dealing with the different ways
to create Implementations.
Besides making composition work, we also spent some time on other areas.
We worked to reduce technical debt that we accumulated in development
during the last two phases which we rushed in order to be ready for the
security and performance reviews. We improved how the error system works,
re-worked the data model for Testers and Errors, refactored the common
library to be more extensible, moved the content of the wiki to the main
namespace, and changed Python function definitions to align with the style
We started with some work to make the current bare-bones user experience
better. This included displaying Testers' results and meta-data on their
own page as well as related Function and Implementation pages. Functions
and Implementations can be easily called right from their page. We made it
much easier to create and connect Implementations and Testers with their
functions, started on the designs for Function definition and
implementation, and implemented aliases that sit alongside labels, much
like in Wikidata. Plenty done!
We are now moving on to Phase η (eta). The three main goals of phase η is
to finish the re-work of the Error system, to revisit user-defined types
and integrate them better with validators, and to allow for generic types.
What are generic types?
We have a type for a list of elements. But instead of saying “this is a
list of elements”, we can often be more specific, and for example say “this
is a list of strings”. Why is that useful? Because now, if, for example, we
have a function to get the first element of a list, we know that this
function will return a string when given this kind of list. This allows us
to then offer a better user experience by making more specific suggestions,
because now the system knows that it can suggest functions that work with
strings. We can also check whether an implementation makes sense by
ensuring that the types fit. We won’t be able to do that in all cases, but
having generics will allow us to increase the number of cases where we can
do that by a lot. For more background you can refer to the Wikipedia
article on generic programming
In this example case, instead of a special type representing a list of
strings, we will have a function that takes a type and returns a typed
list. If you then call this function with the string type as the argument,
the result of the function will be the concept of a list of strings. And
you can easily use that for any other type, including user-defined types.
My thanks to the team! My thanks to the volunteers! Some of us are starting
to have fun using the prototype, playing with implementations across
different programming languages interacting with each other in non-trivial
ways, and starting to build a small basic library of functions. This will
also be the phase where we move from the pre-generic data model
the full function model
give due warning: this probably means that almost everything will need to
be re-written by the end of this phase, in order to take advantage of the
generic system that we are introducing.
Thank you for accompanying us on our journey!
Is very hard to make large or even medium size corpus of sentences, in
which each word would be manually annotated with sense.
Abstract Wikipedia not only allows generate text in many languages from one
source but can be WSD corpus. Moreover: in many languages.
This allows understanding natural text and operations like:
1) translation from any natural language to disambig form
2) translate from this form to other natural language
and after step 1 this form will very useful not only for translation
I was interested in this Abstract Wikipedia project one year ago.Now I'm
not up to date on the topic
On Arctic Knot conference will be look on project as database of
The on-wiki version with the embedded video can be found here:
The team has been busy developing features and designing interfaces for
Wikifunctions, and we are moving towards closing the current phase of the
Aliases are now available in the data model, the data model for testers has
been updated, error objects have been considerably reworked, the evaluation
model can now deal with recursive calls and lazy evaluation, new built-in
functions for string and boolean equality have landed, and more. It is
exciting to see the pieces coming together.
In today’s weekly, we want to take a look at Testers and their current
implementation. Lindsay has created a screencast that you can watch (it is
without sound), and here we will describe what is happening in the video.
We start with creating a new function definition, *“reverse string”*, which
takes a single string as an input and returns a string. On saving (0:19),
the function is created and assigned Z10000. Now we edit the newly created
function, and we create a first tester inline. We give it the name *“test
-> tset”*, and set the argument to “test” and then use the *“String
equality”* function to compare it to the expected result, “tset” (0:45).
*“String equality”* is a built-in function (Z866 on a fresh Wikilambda
installation) that takes two strings as the arguments and returns True if
they are the same, and False otherwise.
Note that even though we have created the Tester inline, in the background
a new page was created (entirely behind the scenes it was assigned ZID
Z10001) that holds the test.
Next, we create a test for the input “racecar”, which is a palindrome,
using the same built-in function (1:00), and a test reversing “banana” and
getting the output “wrong” (which is an example for a bad test) (1:19).
the bottom of the page we already see our three testers working, showing
that they all fail initially (1:30). Now we start implementing the
function, and we enter “return Z10000K1” - and without even saving, the
testers are run against our implementation and we can see that the “racecar”
test passes! (It passes because it is a palindrome, and returning the input
unchanged happens to be a correct implementation for palindromes). The
other two tests keep failing, though (1:41).
We complete the implementation by taking the input, splitting it into an
array of strings, reversing that array, and then joining the strings of the
array again into a single string. Now the first test, *“test -> tset”* also
passes, but the “banana” test (due to being actually a faulty test)
continues to fail (1:54).
We save the implementation, go to the function page, and add the
implementation to the function. On the function page, just like on the
implementation page, we see the status of all the testers for the
Next we create a second implementation, this time in Python. Again, we
start with an implementation that simply returns the input, and again it
passes for “racecar”. We go back to the function page, and connect the new
Python implementation with the function. On the bottom of the page we now
see, in a table, the implementations against all the testers, and whether
the individual testers pass or fail for each implementation (2:28).
We create another two tests, *“another -> rehtona”* and *“final test ->
tset lanif”*, again inline. The tests become immediately visible upon
creation. We still need to save the whole page in order to store the
association with the function page. We can see how both tests pass for the
Let’s go fix the Python implementation. We go to the implementation page
and edit it by adding “[::-1]” to the string. That’s some Python magic -
feel free to skip this paragraph explaining this syntax: Python has a few
very convenient short-hand syntaxes for specific operations which, in many
other languages, require functions or more complex constructs. What is
happening here is that by appending the square brackets to a string
variable, we treat the string implicitly as a list. Inside the square
brackets we have three arguments, separated by colons (:). The first
argument says at which element to start, the second argument at which
element to stop, and the third argument gives the step size (say, you only
want every second element of the list, you would state the step size as 2).
Here, the step is -1, which means you want to walk backwards through the
list. And since the first and second argument are omitted, default values
are used - and the default for a negative step size is from the end to the
beginning. In short, you can read this as *“go through the string,
backwards one by one, from the beginning to the end, and return the new
resulting string”*. You can find a more detailed explanation of Python’s
slice notation on StackOverflow
Once we fixed our Python code (4:07), all but one of the tests satisfyingly
switched to green. We confidently store the new improved version. When we
go to the function page of *“reverse string”*, we can see that now both the
the banana tester!
We go to the page for the banana tester and change the expected value from “
wrong” to “ananab”. Again, before even saving, the testers are re-run
against both implementations and switch from messaging failure to letting
you know they passed (4:26). Going back to the function page, we can now
see that all testers pass all implementations.
Finally, we see a feature added (and recorded, which explains the slightly
different format) a bit later, where a new test is being created inline
(4:39). While we are creating the new tester inline the result of the test
runs for all implementations is already shown - before the tester is even
stored yet. Once we can see both implementations pass, the new tester is
saved (and thus created, 5:28), and then we save the function page itself,
associating the function with the new tester (5:35).
I hope you enjoyed this whirlwind tour through our new tester features, and
it gives you a small glimpse of how Wikifunctions will be working. Feedback
and ideas are welcome, as always.
We are all excited about the weekend: Wikimania 2021 has started! Wikifunctions
and Abstract Wikipedia will host a session
Saturday, 14 August, at 17:00 UTC
<https://iw.toolforge.org/zonestamp/1628960400>, where we will have a panel
to present our work and talk with you and the Wikimedia communities. Please
join us, bring your questions, and we are very much looking forward to a
The on-wiki version of this newsletter is available here:
Our goal with Abstract Wikipedia is to enable everyone to write content in
any language that can be read in any language. Ultimately, the main form of
content we aim for are Wikipedia articles, in order to allow everyone to
equitably have and contribute to unbiased, up-to-date, comprehensive
In the coming months, we will take major milestones towards that goal.
Today, I want to sketch one possible milestone on our way: abstract
descriptions for Wikidata.
Every Item <https://www.wikidata.org/wiki/Help:Items> in Wikidata has a
label <https://www.wikidata.org/wiki/Help:Label>, a short description
<https://www.wikidata.org/wiki/Help:Description>, and aliases
<https://www.wikidata.org/wiki/Help:Aliases> in each language. Let’s say
you take a look at Item Q836805 <https://www.wikidata.org/wiki/Q836805>. In
English, that Item has the label *“Chalmers University of Technology”* and
the description *“university in Gothenburg, Sweden”*. In Swedish it is
tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of
the label is to be a common name for the Item, and together with the
description it should uniquely identify the Item in the world. That’s why,
although multiple Items can have the same label, as things in the world can
be called the same but be different, no two Items should have both the same
label and the same description in a given language. The aliases are used to
help with improving the search experience.
The meaning of the descriptions across languages is often the same, and
when it is not, although sometimes intentional, it usually differs by
accident. Given there are more than 94 million Items in Wikidata, and
Wikidata supports more than 430 languages, that would mean that if we had
perfect coverage, we would have more than 40 billion labels and as many
descriptions. And not only would the creation of all these labels and
descriptions be a huge amount of work, they would also need to be
maintained. If there are not enough contributors checking on the quality of
these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made
great efforts to correct it. Tools such as AutoDesc
<https://autodesc.toolforge.org/> by Magnus Manske
<https://meta.wikimedia.org/wiki/User:Magnus_Manske> and bots such as
Edoderoobot <https://www.wikidata.org/wiki/User:Edoderoobot>, Mr.Ibrahembot
<https://www.wikidata.org/wiki/User:MatSuBot> (these were selected by
clicking “Random Item” and looking at the history) and many others have
worked on increasing the coverage. And it shows: these bots often target
descriptions, and so, even though only six languages have *labels* for more
than 10% of Wikidata Items, a whopping 64 languages have a coverage over
10% for *descriptions*! Today, we have well over two billion descriptions
These bots create descriptions, usually based on the existing statements of
the Item. And that is great. But there is no easy way to fix an error
across languages, nor is there an easy way to ensure that no vandalism has
snuck in. Also, bots give an oversized responsibility to a comparably small
group of bot operators. Our goal is to democratize that responsibility
again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that
we will need to be able to do for Abstract Wikipedia anyway. We want to
start thinking about how to implement this feature, and then derive from
there what will need to happen in Wikifunctions and in Wikidata. This work
will need to happen in close coöperation with the Wikidata team, and the
communities of both Wikidata and Wikifunctions. It will represent a way to
ramp-up our capabilities towards the wider vision of Abstract Wikipedia.
Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but
really I invite you so that we all work together on the design for abstract
- It must be possible to overwrite a description for a given language
- It must be possible to retract a local overwrite for a given language
- The pair of label and description still must remain unique
- It would be great if implementing this would not be a large effort
- The goal is not to create automatic descriptions
The last point is subtle: an automatic description is a description
generated automatically from the given statements of an Item. That’s a
valuable and very difficult task. The above mentioned AutoDesc for example,
starts the English description for Douglas Adams
follows: *“British playwright, screenwriter, novelist, children's writer,
science fiction writer, comedian, and writer (1952–2001) ♂; member of
Footlights and Groucho Club; child of Christopher Douglas Adams and Janet
Adams; spouse of Jane Belson”*. The Item <https://www.wikidata.org/wiki/Q42>'s
current manual English description is the much more succinct *“English
writer and humorist”*. There can be many subtle decisions and editorial
judgements to be made in order to create the description for a given Item,
and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually
created, but instead of being written in a specific natural language, it is
encoded in the abstract notation of Wikifunctions and then we use the
renderers to generate the natural languages text. This allows the community
to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
- We introduce a new language code, qqz. That code is in the range
reserved for local use, and is similar to the other dummy language codes
MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for
- The content of the qqz description is an abstract content. Technically
we could store it in some string notation such as “Z12367(Q3918
<https://www.wikidata.org/wiki/Q34>)”. Or we could store the JSON
- The abstract description would be edited using the same Vue components
we develop for Wikifunctions for editing abstract content.
- The abstract description is a fallback for languages without a
description. It can be overwritten by providing a description in that
- Every time the renderer function or the underlying lexicographic data
changes, we also need to retrigger the relevant generations.
- One question is whether we should store the generated description in
the Item, and if so, how to change the data model in order to mark the
description as generated from the abstract description.
- We also need to figure out how to report changes to everyone who is
interested in tracking them. If we store the generated description as
proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are
whether to store all the generated descriptions in the Item or not, how to
represent that in the edit history of the Item, how to design the caching
and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for
Chalmers is *“university in Gothenburg, Sweden”*. That seems like a
reasonably simple case that could easily be templated into abstract content
say of the form “Z12367(Q3918 <https://www.wikidata.org/wiki/Q3918>, Q25287
<https://www.wikidata.org/wiki/Q34>)”, where Z12367 (that ZID is made-up)
represents the abstract content saying in English *“(institution) in
(city), (country)”*, Q3918 <https://www.wikidata.org/wiki/Q3918> the QID
for university, Q25287 <https://www.wikidata.org/wiki/Q25287> the QID for
Gothenburg, and Q34 <https://www.wikidata.org/wiki/Q34> the QID for Sweden.
(In reality, this template is actually nowhere near as simple as it looks
like - we will discuss this more in an upcoming weekly newsletter. For now,
let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language
generate the description, in this case *“university in Gothenburg, Sweden”* for
English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there
is already an English description, we wouldn’t store nor actually generate
the text, but in Croatian we would generate it, store it, and mark it as a
We think of this as a good milestone on our path to Abstract Wikipedia,
with a directly useful outcome. What are your thoughts? Join us in
discussing this idea on the following talk page:
In other news, Lindsay has created a video of a new feature: how Testers
and Implementations work together to show whether the tests pass. The video
is availabe here:
The video shows how she is changing the implementation and re-running the
testers several times. Testers will be a main component in ensuring the
quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania.
On 14 August, at 17:00 UTC, we will host a 1.5 hour session on
Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an
entirely virtual event and registration is free. Bring your questions and
discussions to Wikimania 2021.
Next week, we are skipping the weekly update.