Newsletter #29: Lexicographical coverage, a first function call evaluated, and more - Abstract-Wikipedia

29 Apr 2021

The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-04-29

This week, I want to start with a shoutout to our phenomenal volunteers.

Lexicographical coverage

My thanks to Nikki <https://www.wikidata.org/wiki/User:Nikki> and their
updates on the dashboards about lexicographical coverage
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_coverage>. Since
the first publication of the dashboard, Nikki has kept the dashboards up to
date, re-running them from time to time and updating the page on Wikidata.
They and others have also fixed numerous issues, created more actionable
lists, and added more languages based on other corpora than Wikipedia (most
notably from the Leipzig Corpora Collection
<https://wortschatz.uni-leipzig.de/en>). Thanks also to Mahir
<https://www.wikidata.org/wiki/User:Mahir256>, who also contributed to the
dashboard, particularly covering Bengali, one of our focus languages.

In fact, thanks to Nikki and Mahir, the four main focus languages are now
all covered: we have numbers for Bengali, Malayalam, Hausa, and Igbo. We
are still missing our stretch focus language, Dagbani, because we could not
find yet a corpus. We have reached out to a researcher who has compiled a
Dagbani corpus
<https://www.aflat.org/content/corpus-building-predominantly-oral-culture-notes-development-multi-genre-tagged-corpus-dagba>,
and we also are exploring how we could use the Dagbani Wikipedia
<https://incubator.wikimedia.org/wiki/Wp/dag> on Incubator
<https://incubator.wikimedia.org/wiki/Incubator:Main_Page>. In the
meantime, we are pleased to see that the Dagbani community has put in a request
for a new Wikipedia edition
<https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Dagbani>
and that they feel that they are ready to graduate from incubator!
Congratulations!

Some of the results of highlighting the dashboard, and particularly the
list of most frequent missing lexemes, were very promising: coverage in a
number of languages has increased considerably. To just list a few
examples: Polish went from 16% to 32% coverage, German from 53% to 67%,
Czech from 44% to 57% — and Hindi went from a mere 1% to 15%, and Malay
from 15% to an astonishing 53%! Congratulations to those communities and
others for such visible progress.

With an eye on our focus languages, Bengali went from 18% to 28%, Malayalam
is at 21%, whereas Hausa and Igbo both have coverages of below 1%.

Another great tool to see the progress in lexicographical knowledge
coverage in Wikidata is Ordia <https://ordia.toolforge.org/>, developed by Finn
Årup Nielsen <https://meta.wikimedia.org/wiki/User:Fnielsen>. Ordia is a
holistic user experience that allows users to browse and slice and dice the
lexicographic data in Wikidata in real time. We can take a look at the 11,400
Malayalam Lexemes <https://ordia.toolforge.org/language/Q36236>, the 8,724
Bengali Lexemes <https://ordia.toolforge.org/language/Q9610>, 53 Dagbani
Lexemes <https://ordia.toolforge.org/language/Q32238>, 15 Hausa Lexemes
<https://ordia.toolforge.org/language/Q56475>, and the single Lexeme in Igbo
<https://ordia.toolforge.org/language/Q33578>, mmiri, the Igbo word for
water. Thanks to Finn for Ordia!

Making the state of the lexicographical coverage visible shows us that
there is still a lot to do — but also that we are already achieving
noticeable progress! Thanks to everyone contributing.

By the way, the annotation wiki <https://annotation.wmcloud.org/> is
currently having issues. If you would like to help us with running it and
have experience with Vagrant and Cloud VPS based wikis, please drop me a
line on my talk page <https://meta.wikimedia.org/wiki/User_talk:Denny>.

A first running function call!

Lucas Werkmeister <https://meta.wikimedia.org/wiki/User:Lucas_Werkmeister>
consistently keeps being amazing. He is working on GraalEneyj
<https://github.com/lucaswerkmeister/graaleneyj>, a GraalVM
<https://www.graalvm.org/>-based evaluation engine for Wikifunctions,
written in Java. Lucas re-wrote GraalEneyj to be able to call a function
all directly from the notwikilambda test-wiki — the very first time that
one of our functions is being evaluated! You can watch that moment in a
Twitch video <https://www.twitch.tv/videos/975239172>.

We are still working on replicating that feat in what will be our
production codebase, and hope to soon connect our backend evaluating
functions with the wiki — this is our goal for the ongoing Phase δ
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases#Phase_%CE%B4_(delta):_built_ins>
(delta). Congratulations to Lucas for achieving this step!

Delay on logo

There will be a delay on the logo finalization. Please expect another month
or two before we will have news to share about the logo. Due to the legal
nature of some of the involved issues, we have decided to not share details
in public. Sorry for the delay, and I am looking forward to sharing the
next steps in this process.

New documents

We have been working for a while with the Wikimedia Architecture Team on a
number of artefacts around Abstract Wikipedia and Wikifunctions. We have
now published and shared these documents in the Architecture repository
<https://www.mediawiki.org/wiki/Architecture_Repository/Strategy/Goals_and_initiatives/Abstract_Wikipedia_Architecture>.
We are aiming to keep publishing our design documents and related
development artefacts, and are happy to invite you to this set of documents.

Based on requests from the community, we also worked on a new example of an
article in abstract content
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples/Jupiter>. The
example is not complete, and is open to being edited and discussed. Note
that this is not meant to be prescriptive of how abstract content should
look like, but merely a more concrete hypothetical example of what it could
look like. I am confident that the community as a whole will come up with
better abstractions than I did. Please do edit or fork that page.

There will be three approaches towards creating an implementation for a
function in Wikifunctions, and the current and following two phases of
development are each dedicated to one of those approaches: (1) allow to
call a built-in implementation in the evaluator engine, (2) allow to call
native code in a programming language, and (3) compose other functions to
implement a new function. In preparation for the upcoming Phase ζ
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases#Phase_%CE%B6_(zeta):_composition>
(zeta), we have created a few examples of function composition
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples/Function_composition>
.