The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-03-23
--Abstract Wikipedia and Grammatical Framework
Aarne Ranta <https://www.cse.chalmers.se/~aarne/>, a member of the Abstract
Wikipedia NLG Special Interest Group, and professor at the University of
Gothenburg, Sweden <https://en.wikipedia.org/wiki/University_of_Gothenburg>,
has written an article titled “Multilingual Text Generation for Abstract
Wikipedia in Grammatical Framework: Prospects and Challenges”. Here is the
abstract of the article:
Abstract Wikipedia is an initiative to produce Wikipedia articles from
abstract knowledge representations with multilingual natural language
generation (NLG) algorithms. Its goal is to make encyclopaedic content
available with equal coverage in the languages of the world. This paper
discusses the issues related to the project in terms of an experimental
implementation in Grammatical Framework (GF). It shows how multilingual NLG
can be organized into different abstraction levels that enable the sharing
of code across languages and the division of labour between programmers and
authors with different skill requirements. The plan is to start with a
simple but functional multilingual NLG system and to proceed towards more
and more sophisticated language and wider coverage of topics, also allowing
a human in the loop to create content via a Controlled Natural Language
(CNL).
The paper can be read here:
https://link.springer.com/chapter/10.1007/978-3-031-21780-7_6
It can also be accessed as a preprint free of charge from the following
link: https://www.grammaticalframework.org/~aarne/preprint-AAM-textgen.pdf
The online version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-03-15
Quo vadis, Abstract Wikipedia?
Abstract Wikipedia will allow more people to contribute their voices to a
baseline of knowledge, whilst working in an interface in their language.
This shared baseline of knowledge will then be made available in many
languages. This will be achieved by creating, storing, and maintaining the
baseline of knowledge, per individual article, in a notation that is
independent of natural language. We call content written in that notation
"abstract content". This abstract content will then be turned into text in
a specific natural language, with the help of the library of functions from
Wikifunctions. Thus, Abstract Wikipedia will allow a speaker of any
language to contribute content for readers of many different languages.
This will allow more people to read more content in their language.
Example
Assume that we want to create a new Wikipedia article for the planet
Jupiter, and the first version of this article shall be the following
(these are the first two sentences of the Simple English Wikipedia
<https://simple.wikipedia.org/wiki/Jupiter> article):
*Jupiter is the largest planet in the Solar System. It is the fifth planet
from the Sun.*
The abstract content representing this natural text could look like this:
Article(
text: [
Superlative(
subject: Jupiter,
quality: large,
class: planet,
location constraint: Solar System),
Definition(
subject: Jupiter,
definition: Rank(
rank: Positive integer(
value: 5),
object: planet,
by: Relational noun(
noun: distance,
to: Sun)))],
categories: [Jupiter, planet, Solar System])
Wikifunctions would have types for Article, Superlative, Definition, Rank,
*etc.*, which are used here as the abstract notation for the content of
these two sentences. This abstract content is shown using labels in English
here for our convenience, whereas in fact they would all be ZIDs (from
Wikifunctions) and QIDs (from Wikidata). There will be software components
to provide for the viewing, creation, and editing of abstract content.
Wikifunctions will then also provide functions that take this object as an
argument and generate natural language text such as the above.
One question that needs to be answered is where these objects would be
stored, and how to associate the above object with the Wikidata item for
Jupiter, Q319 <https://www.wikidata.org/wiki/Q319>. We were originally
planning to have this conversation and decision before the launch of
Wikifunctions, but looking at the complexity of the system and the fact
that it is very difficult to imagine given that so little of it is tangible
so far, we decided not to open this question for discussion now, but to
wait until after the launch of Wikifunctions, when we will all have a
better understanding of how that part of the ecosystem works.
Below, we outline a few options that came up in the discussion between
folks on the Abstract Wikipedia team and the Wikidata team at Wikimedia
Deutschland. It also ties to some of the questions Lydia Pintscher and I
were answering in an interview on the Wikimove podcast episode
<https://meta.wikimedia.org/wiki/WIKIMOVE/Podcast> that was released today
and that we invite you to listen to. Thanks to Nicole Ebber and Nikki
Zeuner for the interview!
We are genuinely undecided about the best answer, and we would benefit from
a wider discussion of the options, and potentially other options as well.
Please also ask questions - these can often clarify and shine light on
points that are muddy to us as well. We currently are focusing on the
following five options:
1. A new tab on items in Wikidata
2. Create a new data type for objects on Wikidata
3. Objects on Wikifunctions
4. Objects on a new Wikipedia language edition
5. Unattached namespace on an existing project
Let’s discuss these five options in the following.
Option 1: A new tab on items in Wikidata
We could add a tab, leading to a new namespace on Wikidata with a new
content type, where the abstract content would live. This namespace would
be attached to the item namespace in Wikidata. This way, every abstract
content would have a natural place to store its associated abstract content.
Given the little use of the item talk pages on Wikidata, it seems an
additional talk page may not be valuable, so we might want to redirect
people to the item's main talk page for any discussions.
One big question would be where to store content for Wikivoyage and other
projects about the same items (*e.g.* the abstract content we might want to
write about Q90/Paris from the perspective of Wikipedia would be different
from that for Wikivoyage). Would that need yet another namespace? Adding a
new associated namespace would be a technical challenge; adding several
would be a complex task, and we hope that not to happen often.
Option 2: Create a new data type for Wikifunction objects
We could create a new data type on Wikidata for Wikifunctions objects. Then
the community could create a property on Wikidata that stores the abstract
content on a given item as a literal. This would have the added flexibility
that the community could add more abstract contents to an item for specific
use cases, e.g. to represent content from Wikivoyage, or to represent the
history of an item, the etymology of a word, *etc.*
We will need such a datatype and the UX for it anyway, given our planned
support for abstract descriptions
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29>. In
fact, this might be the simplest way to support abstract description.
However, the UX of Wikidata doesn't lend itself to this easily, and
adapting to this model would be challenging given the constraints of an
item page. Existing properties are edited as one or a small number of
simple, short text boxes, often with auto-complete; abstract content would
instead be a larger text area, with helpers and possibly a toolbar, a
preview control, *etc.*. One option could in principle be a modal dialog
for editing, but these come with their own inherent UX downsides and are
usually more complex to implement than the same functionality in its own
dedicated environment. Also, this would break the current design patterns
of an item page, and may not be aligned with the patterns that might be
planned for its future.
Further, while abstract content is somewhat structured and
machine-readable, it is less (or differently) so than an Item, and its
structure would probably not be queryable with SPARQL.
This option comes with two additional challenges: some Wikidata items,
already nearing the maximum size, would grow still further, and we would
need a solution to allow that, and second how to deal with the
visualization, editing, and diffing of potentially very large and complex
values.
Option 3: Objects on Wikifunctions
Instead of having objects live in Wikidata as the values of statements or
on an additional tab, Wikidata could merely store a pointer to an object on
Wikifunctions. We still would create a new data type, but that data type
would be just the ZID of an object stored in Wikifunctions.
This would solve all the challenges of the previous option, and retain many
of its advantages.
It would have the additional advantage that several items could refer to
the same object on Wikifunctions. Whereas this sounds rarely useful for the
abstract content of Wikipedia articles, this might prove very useful for
abstract descriptions of Wikidata items and the abstract glosses of
lexemes. With the creation of abstract content, types, and natural language
generation functions on the same platform, collaboration between people who
focus on one of these areas would be more direct.
This option could be a challenge for the Wikifunctions community. The scope
would expand to cover content as well as functions. This could make it
difficult for the smaller community, and need more community patrollers
like those already active on Wikidata.
Option 4: Objects on a new Wikipedia language edition
We could launch a new Wikipedia language edition in which the main
namespace is abstract content. This could be called e.g. the multi-lingual
Wikipedia (mul.wikipedia.org) language edition, or the abstract Wikipedia
language edition (abstract.wikipedia.org). Like all other language
editions, the pages are connected to the items via sitelinks in Wikidata.
If Wikivoyage wanted to use the same approach, they would need to copy the
setup and create a multi-lingual Wikivoyage edition (or, as Option 4B,
perhaps these could be a different namespace on a single shared ‘abstract’
content wiki?).
This would give a very clear distinction of what is Wikipedia content and
what is not, and give the abstract content a distinct visibility, which
would otherwise be somewhat hidden between Wikifunctions and Wikidata. On
the other hand, it would splinter the communities further, and mix in a
"new" concept of a wiki that isn't really a Wikipedia but is labelled as
such.
Option 5: Unattached namespace on existing project
We could introduce a new namespace to an existing project where the
abstract content for Wikipedia would effectively live. Here are the most
likely options:
1. Wikidata (as a separate namespace, not attached to the items)
2. Commons
3. Meta
4. English Wikipedia (not attached to the articles)
Whereas technically all these options would be the same, they would be
extremely different from a social and community perspective. We will
refrain here from discussing these differences for now, unless this starts
becoming a more likely option.
Rant: One Wikimedia movement, or many projects?
Conway’s law <https://en.wikipedia.org/wiki/Conway%27s_law> states that
software mirrors organization, or, as it was put, if you put three teams to
work on a compiler, you get a three-pass compiler. The opposite is also
true, as previously observed
<https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…>
by Guillaume Paumier <https://meta.wikimedia.org/wiki/User:Guillom>: a
software system determines the community structure that evolves around the
system. Within the Wikimedia projects, we often see this effect in the
dynamics between the different Wikipedia language communities, the Wikidata
community, the Commons community, Meta, *etc.* The stories of local
protection of media files on Commons, of wikis opting-out from global
anti-abuse tools, or of short descriptions in English Wikipedia should be
warning enough.
The question we ask today is not only hard because there are genuine
technical challenges that we have to overcome, and we have to make a
tradeoff. It is additionally so much harder because we can anticipate that
there will be fracture lines between the different projects, and maybe even
anticipate how these fracture lines would shape out. The whole story would
be so much easier if we would, in general, regard ourselves as being part
of one common movement, as one large community. But I doubt that we will
see much progress on this trajectory before we have to resolve the above
question.
Until then we only can remain mindful about the possible solutions, their
effects, and that we should be careful to design for the world as it is and
as it likely will be, and not for the world we wish to be.
Public NLG workstream on Tuesday
On Tuesday, March 21, there will be the third public NLG workstream meeting
on JITSI. Feel free to reach out to me and suggest presentations beforehand
if you want. We have a bit already planned, but there is still space. The
meeting is 16:30-17:30 UTC <https://zonestamp.toolforge.org/1679416242>,
which is an hour off for US friends.
An on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-03-02
----
Decolonizing Functions
In the newsletter two weeks ago
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-02-17>, we
discussed the history of the idea of a compendium of functions, tracing it
back to al-Khwarizmi
<https://en.wikipedia.org/wiki/Muhammad_ibn_Musa_al-Khwarizmi> and his
influential book on Algebra
<https://en.wikipedia.org/wiki/The_Compendious_Book_on_Calculation_by_Comple…>.
I noticed an uptick in reactions to that piece from people of cultural
backgrounds that relate to al-Khwarizmi. And that should come as no
surprise.
If one identifies, even roughly, with white, Western, Christian,
heterosexual, cis-gendered men, it is easy to miss how many of the
narratives we encounter are tuned to that demographic. If one doesn’t
identify with that demographic, having a figure or narrative that
reverberates more with one's own identity or heritage can feel empowering
and inspiring. It can offer an example of, “look, there’s someone like me,
and look at that great thing they did!”
For a few decades now, we have seen a refreshing trend of diversifying the
protagonists in the stories being told in books, movies, and in newspapers
in the Western world. Unfortunately, these 'novel' protagonists are often
met with pushback and resistance, as if the world of stories and narratives
was a limited space, as if by having more narratives that are tuned to
under-represented demographics we thus reduce the narratives that are tuned
to the most prominent demographic. But the cultural space is not a zero sum
game; the space of narratives is infinite.
Similarly, Wikipedia, by being online, does not have to think about page
limitations in the way a printed encyclopedia does. Writing two paragraphs
more on al-Khwarizmi in a Wikipedia version does not mean I have to write
two paragraphs fewer on Pascal. I do not have to balance the space I
dedicate to Ada Lovelace with the space dedicated to Charles Baggage.
Writing more about the history of the Dagomba people does not mean I need
to cut down on the history of Rome. Each Wikipedia has to (and does)
struggle with the effect of its policies and guidelines on how we are
biasing the encyclopedia towards certain narratives, but that is a
different story, to be told by someone else in a different place.
A few weeks ago, *Nature <https://en.wikipedia.org/wiki/Nature_(journal)>*,
one of the leading science journals in the world, published two articles in
tandem: one on making mathematics truly universal
<https://www.nature.com/articles/d41586-023-00223-w> through the program of
decolonization, and the other on why the idea of decolonizing mathematics
is no cause for alarm <https://www.nature.com/articles/d41586-023-00240-9>.
These articles are part of a much-needed series on decolonizing science
<https://www.nature.com/collections/giaahdbacj> *Nature* is running.
Decolonizing mathematics is not a novel concept nor phrase the term goes
back at least a good quarter century <https://www.jstor.org/stable/41674951>,
and also this newsletter wrote about it previously
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-12-09>.
Predictably, the articles in *Nature* have caused alarm and have been
misunderstood, pretty much in the way the articles themselves predicted.
The way I understand the program of decolonizing mathematics is to follow
two principles:
- first, to recognize the contributions of people with diverse
backgrounds, in order to offer more protagonists who can inspire and with
whom more people can identify
- second, to provide examples and motivations that are relatable to
under-represented backgrounds and identities, in order to reach and be more
immediately helpful to more people
<https://meta.wikimedia.org/wiki/File:Yanghui_triangle.PNG>
<https://meta.wikimedia.org/wiki/File:Yanghui_triangle.PNG>
Yanghui's triangle, published 1303
The first principle relates more to Wikipedia than to Wikifunctions, and
even though there is room for improvement, Wikipedia is already pretty good
at reflecting a comprehensive and multi-faceted history (see, for example,
the history of Pascal’s triangle
<https://en.wikipedia.org/wiki/Pascal%27s_triangle#History> on English
Wikipedia), especially across different language editions (compare to Yang
Hui’s triangle
<https://zh.wikipedia.org/wiki/%E6%9D%A8%E8%BE%89%E4%B8%89%E8%A7%92%E5%BD%A2>
on
Chinese Wikipedia). I hope that with Abstract Wikipedia we will see an even
tighter integration of different narratives, and see their wider
distribution in many languages.
The second principle in particular can and should also be applied to
Wikifunctions. We should make space for examples which are rooted in the
individual backgrounds of under-represented users of Wikifunctions, to
highlight how many different people can benefit from Wikifunctions. This
was exemplified by al-Khwarizmi’s book and its focus on Muslim inheritance
law, but also how relatable examples in university courses lead to much
better results, as described by Jessica Nordell
<https://en.wikipedia.org/wiki/Jessica_Nordell> in her book *The End of
Bias: A Beginning*
<https://en.wikipedia.org/wiki/Special:BookSources/978-1-250-18618-8>. I
very much hope that Wikifunctions will consciously provide the space for
relatable and diverse examples from many different areas.
Recordings
The recording of Maria Keet’s presentation on abstract representations is
now available on Wikimedia Commons. Maria Keet talks about the design of
the "abstract content" language for writing "constructors", which are those
pieces of structured information that are positioned between Wikidata and
Wikifunctions as a source on the one side of the pipeline and the machinery
for rendering that content into natural language sentences or paragraphs of
text on the other side in the pipeline. The recording can now be watched on
Commons here:
commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Natural_language_generat…
The regular *Conversation with Trustees* is an opportunity for community
members to speak directly with the Wikimedia Foundation's Trustees about
their work. The Board of Trustees is a volunteer body of movement leaders
and external experts in charge of guiding the Wikimedia Foundation and
ensuring its accountability. The 23 February 2023 conversation included a
short update on Abstract Wikipedia and Wikifunctions, and answered some
community questions that were asked. A recording of the conversation is
available on YouTube for now, and will also be available on Wikimedia
Commons eventually
<https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Community_Affairs_Comm…>
:
www.youtube.com/watch?v=cGqHUrpU2Rc
Volunteers’ corner on 6 March 2023
This upcoming Monday will see our monthly Volunteer’s corner. The meeting
will be on Monday, 6 March 2023, at 18:30-19:00 UTC
<https://zonestamp.toolforge.org/1678127424> and you can join on Jitsi on
the following link: https://meet.jit.si/AWVolunteersCorner
Bring your questions, your ideas, or even just your curiosity, and we will
find and help with places you can contribute.
Development update
- The large patchsets we have been working on are landing or close to
landing
- *Goal 2* (efficient and correct evaluation) has seen the patch land
that splits the one big evaluator into individual language evaluators. We
are working now on propagating these changes to the beta.
- *Goal 3* (meta-data) has seen the patch land that reorders
implementations. We are now working on enabling that on the beta.
- *Goal 5* (meta-data) the work on typed lists is ongoing, and work on
function calls has been picked up and a first version has landed
- *Goal 6* (stable and secure system) has seen the rights system land,
and now requires deployment and testing
- *QTE* has presented the work on e2e testing that will lead to
integration testing become part of CI
- *Design* has reached a state where we have caught up with the state of
the implementation, and are have prepared most of *Goal 9* and have
starting now to prepare for documentation in the next phase