The on-wiki version of this newsletter can be found here:
Communities will create (at least) two different types of articles using
Abstract Wikipedia: on the one hand, we will have highly-standardised
articles based entirely on Wikidata, called model articles; and on the
other hand, we will have bespoke, hand-crafted content, assembled sentence
by sentence. Today we will discuss the second type, after we discussed the
first type, model articles, in a previous newsletter
Both types, by the way, can be implemented by the "templatic renderers"
concept that is part of Ariel Gutman’s proposal
will also dedicate a future newsletter to a comparison of the two types.
For manually-assembled articles, we have to make many more assumptions
about what will eventually be available in Wikifunctions than we do for
model-based articles. The following description is not meant to prescribe
to the community how things should work, but provides just the sketch of a
possibility. It is based on a "Wizard of Oz experiment"
<https://en.wikipedia.org/wiki/Wizard_of_Oz_experiment> we did during our
recent Abstract Wikipedia team offsite
We took the first sentence from a semi-randomly chosen article, with the
aim to handcraft the representation of said sentence in Abstract Wikipedia.
It's often harder to see how to translate articles about ideas than more
concrete things like people, places, and objects. The sentence came from
the English Wikipedia article Profit (economics)
<https://en.wikipedia.org/wiki/Profit_(economics)>, which we picked as a
common example of a concept:
An economic profit is the difference between the revenue a commercial
entity has received from its outputs and the opportunity costs of its
Note that we do not expect that English Wikipedia will be the source for
all articles for Abstract Wikipedia, but it is certainly a convenient
source of inspiration for the team, given that all of us speak English. As
a baseline, we each manually translated that text into the languages we
One powerful, if not the most powerful tool in our arsenal towards turning
this sentence into abstract content is that we can rewrite and simplify it.
In Abstract Wikipedia the goal is not to translate as faithfully as
possible the wording of any existing Wikipedia articles, but to capture as
much as possible of the meaning of the articles. So we took the freedom to
rewrite the sentence as follows:
In economics, the profit of a commercial entity is defined as the
difference between its outputs’ revenue and its inputs’ opportunity cost.
We further reduced the sentence, due to time constraints, as simply:
In economics, profit is defined as the difference between revenue and cost.
We then from this assembled the following abstract content.
- *context*: economics <https://www.wikidata.org/wiki/Q8134>
- *content*: *Definition*
- *subject*: profit <https://www.wikidata.org/wiki/Q26911>
- *definition*: *Difference*
- *first*: income <https://www.wikidata.org/wiki/Q1527264>
- *second*: operating cost <https://www.wikidata.org/wiki/Q831940>
Here, the bold text is the label of a constructor, the italic text is the
label of a key of the given constructor, and the link points to a Wikidata
item. This follows the notation used in previous examples. Just as with
previous examples, we assume the availability of the used constructors. To
be explicit, in this case we assume the constructors listed below with
their respective keys. How the keys or constructors would be named, and in
fact, which constructors and keys would even exist, might very well be very
*Context* returns a full clause representing a subordinate clause being put
in a context
- *context* take a noun phrase, describing the context in which the
- *content* takes a clause that is being put in the context
*Definition* returns a full clause defining something as a definition
- *subject* takes a noun phrase that is being defined
- *definition* takes a noun phrase that represents the definition
*Difference* returns a noun phrase that means the quantitative difference
between two given noun phrases
- *first* takes a noun phrase that represents the first part
- *second* takes a noun phrase that represents the second part
Where we have mentioned "noun phrase" above, we actually mean "concept that
can be realized as a noun phrase by a renderer". Also, we have glossed over
the considerable challenge of having a mechanism through which a renderer
could just take in a Wikidata item and turn it into a noun phrase. That is
a challenge that Mahir has tackled admirably with Ninai and Udiron
Another challenge was to find the right Wikidata items for each of the
involved noun phrases. For example, for the second key of the Difference
constructor, we chose operating cost <https://www.wikidata.org/wiki/Q831940>.
Other candidates could have been cost
<https://www.wikidata.org/wiki/Q240673> or opportunity cost
<https://www.wikidata.org/wiki/Q185715>. Again, this is not necessarily the
best choice, but just the one we came up with, given our time constraints
and the way we approached the task.
The final step of the exercise was to take that abstract content, and to
render (by hand) a natural language text in the languages that we speak, as
mechanically as possible, using the labels of the selected Wikidata items
(it should be the lexeme connected to the items, but that was too sparse).
This step is why we called the whole exercise a “Wizard of Oz” exercise, as
we simulate here what renderers in Wikifunctions would do.
Here are some results (unfortunately, we didn’t record the results we came
up with during the offsite, so we re-created them for this newsletter):
*English*: In economics, economic profit is defined as the difference
between income and operating cost.
*German*: In Wirtschaftswissenschaft ist Gewinn definiert als der
Unterschied zwischen Einkommen und Betriebskosten.
*Croatian*: U ekonomiji, dobit je definiran kao razlika između dohodka i
*Russian*: В экономике, экономическая прибыль определяется как разница
между доходом и операционными затратами.
*French*: En économie, le profit est défini comme la différence entre les
revenus et les dépenses d'exploitation.
*Spanish*: En economía, ganancia económica se define como la diferencia
entre ingresos y costes*.
*Kannada*: ಅರ್ಥಶಾಸ್ತ್ರದಲ್ಲಿ, ಆರ್ಥಿಕ ಲಾಭವನ್ನು ಆದಾಯ ಮತ್ತು ನಿರ್ವಹಣಾ ವೆಚ್ಚದ
ನಡುವಿನ ಅಂತರವೆಂದು ವ್ಯಾಖ್ಯಾನಿಸಲಾಗಿದೆ.
בכלכלה, רווח מוגדר כהפרש בין הכנסה להוצאות תפעוליות.
*Swedish*: I nationalekonomi definieras vinst som skillnaden mellan inkomst
*Italian*: In economia, il profitto è definito come la differenza fra il
reddito e i costi operativi*.
في الاقتصاد*، يتم تعريف الربح على أنه الفرق بين الدخل المالي والمصروفات
Words marked with an asterisk were given manual translations from us, as
they did not at the time have a label in Wikidata, or the label did not fit.
During the offsite, we evaluated the results, and found them in fact not
only readable (although not perfect), but also easier to understand than
our initial translation. This is likely an effect of the simplification
process the text underwent. The whole exercise left us filled with optimism
about the approach.
*This newsletter was late due to the amount of discussion it generated
internally. Don’t expect everyone on the team to agree on everything being
said here. We think these discussions should be in the open, for everyone
to join in. Expect more to follow.*
We are getting additional support from ThisDot technical writers: Two
ThisDot technical writers will be joining the team for the remainder of
June to figure out how to on-board users into the concept of functions, and
how to communicate to users what functions are and how they work, in an
Below is the brief weekly summary highlighting the status of each workstream
- Drafted the Performance Metrics document
- Started research on reported slowness in function evaluation
- Added logging and dashboarding to Beta Cluster and wrote documentation
for Beta Cluster
- Wrote a Proof of Concept of support for new Wikifunctions features to
support proposed NLG pipelines
- Altered MediaWiki PHP and Vue layers to handle either format
- Ensured that no function-orchestrator test code/cases employ the old
- WikiLambda PHP and Function-schemata finished and merged
- Design: continue working on typed list view
- Front-end: made ISO codes mobile friendly and started table component
The on-wiki version of this newsletter can be found here:
Last year, we welcomed Aishwarya Vardhana
the team. This week is her last week on the Abstract Wikipedia project, as
the Design group at the Wikimedia Foundation is rotating some of their
designers between teams.
During her work on Wikifunctions, Aishwarya has brought novel perspectives
and points of view to the project, which will shape Wikifunctions for years
to come. Her design work, which was frequently featured in this newsletter,
and her guidance in diligently testing crucial components of Wikifunctions,
will lead to an immensely improved product. Her voice and her work towards
anchoring the values of Wikifunctions in diversity, equity, and inclusion
will have a lasting impact on the whole project. It is a pleasure working
with Aishwarya, and the whole team is sad to see her go. Fortunately,
Aishwarya is staying with the Wikimedia Foundation and will take over as
designer on the Trust and Safety Tools team; we'll introduce our new design
colleague in a future newsletter.
Aiswharya just recently summarized many of the UX research results
a newsletter of its own.
We are very thankful for her contributions, and we congratulate her on her
new role. Here are some words from Aishwarya.
Namaste Abstract Wikipedia community! This project is near and dear to me
and will always hold a special place in my heart. Thanks to all of you for
your commitment to this important, decolonial
<https://en.wikipedia.org/wiki/Decoloniality> effort. A question that I
wrestled with throughout my time designing Wikifunctions has been, who is
Wikifunctions for? Will it truly be a diverse and equitable community? Will
it be an ecosystem based in mutuality and trust? Might this audience be
different from all the other Wikimedia projects by embracing diversity,
equity, and inclusion from day one? The answers to these questions must
come from all of us. Each one of us that engages with the project, who
writes feedback or an implementation, asks questions, or approves testers,
is a steward for these values. As I depart the team and transition into a
volunteer, I have faith in us as a collective. See you on the internet!
You can follow Aishwarya’s writing on Medium
<https://aishwaryavardhana.medium.com/> or on thewildword.com.
*We are hiring!*
The Wikimedia Foundation is hiring for a Staff Software Engineer as a
Quality and Test Engineer
<https://boards.greenhouse.io/wikimedia/jobs/4321901>! Wikifunctions and
Abstract Wikipedia are complex systems, and we need help in order to
improve the reliability and the development velocity of our system. We are
looking for someone to develop and set up an environment that will allow
our engineers to write tests, from effective unit tests to integration
tests to end to end tests; to ensure that our tests are run during
continuous integration; and to allow for high-quality rollouts of new
features to Wikifunctions.
If you are interested, please apply, or if you know someone with the
relevant experience and interest, please let them know.
*Workstream updates (as of June 17)*
- Shared the Performance Metrics document with SRE for approval
- Aligned on scope for Metadata and Performance workstreams
- Progressed migration of the tester pipeline from orchestrator into
- Work in progress: finalizing the set of deliverables and goals for
- Prepared for Wikidata Quality Days presentation
- Altered function-orchestrator and MediaWiki PHP API to respond with a
- Aishwarya presented the Wikifunctions workflow at the Product
Department monthly meeting
- Cleanup tasks for function-schemata and wikilambda
- Finished table component implementation
- Completed basic implementation of tester and implementation tables
- Handed off designs for typed list view
The on-wiki version of this newsletter can be found here:
Communities will create (at least) two different types of articles using
Abstract Wikipedia: on the one hand, we will have highly-standardised
articles based entirely on Wikidata; and on the other hand, we will have
bespoke, hand-crafted content, assembled sentence by sentence. Today we
will discuss the first type, and we will discuss the second type in an
Articles of the first type can be created very quickly and will likely
constitute the vast majority of articles for a long time to come. For that
we can use models, *i.e.* a text with variables. Put differently, a text
with gaps which get filled from a different source such as a list, along
the lines of the mad libs <https://en.wikipedia.org/wiki/Mad_Libs> game. A
model can be created once for a specific type of item and then used for
every single item of this type that has enough data in Wikidata. The
resulting articles are similar to many bot-created articles that already
exist in various Wikipedias.
For example, in many languages, bots were used to create or maintain the
articles for years (such as the articles about 1313
<https://www.wikidata.org/wiki/Q6315>, or 1697
<https://www.wikidata.org/wiki/Q7702>, each of which is available in more
than a hundred languages). In English Wikipedia, many articles for US
cities were created by a bot
on the US census, and later updated after the 2010 census. Lsjbot
<https://en.wikipedia.org/wiki/Lsjbot> by Sverker Johansson is a well known
example of a bot that has created millions of articles about locations or
species across a few languages such as Swedish, Waray Waray, or Cebuano.
Comparable activities, although not as prolific, have been going on in
quite a few other languages.
How do these approaches work? Assume you have a dataset such as the
following list of countries:
Jordan Asia Amman 10428241
Nicaragua Central America Managua 5142098
Kyrgyzstan Asia Bishkek 6201500
Laos Asia Vientiane 6858160
Lebanon Asia Beirut 6100075
Now we can create a model that can generate a complete text from this data,
“*<Country>* is a country in *<Continent>* with a population of
*<Population>*. The capital of *<Country>* is *<Capital>*.”
With this text and the above dataset, we would have created the following
five proto-articles (references not shown for simplicity):
*Jordan* is a country in Asia with a population of 10,428,241. The capital
of Jordan is Amman.
*Nicaragua* is a country in Central America with a population of 5,142,098.
The capital of Nicaragua is Managua.
*Kyrgyzstan* is a country in Asia with a population of 6,201,500. The
capital of Kyrgyzstan is Bishkek.
*Laos* is a country in Asia with a population of 6,858,160. The capital of
Laos is Vientiane.
*Lebanon* is a country in Asia with a population of 6,100,075. The capital
of Lebanon is Beirut.
Classical textbooks on that topic such as *“Building natural language
this method *“mail merge”* (even though it is used for more than mail). A
model is combined with a dataset, often from a spreadsheet or a database.
This has been used for decades to create bulk mailings
<https://en.wikipedia.org/wiki/Mail_merge> and other bulk content, and is a
form of mass customisation
<https://en.wikipedia.org/wiki/Mass_customization>. The methods have become
increasingly complex over time and are able to answer more questions: How
to deal with missing or optional information? How to adapt part of the text
to the data, *e.g.* use plurals or grammatical gender or noun classes where
appropriate, *etc.*? The bots that were mentioned above, which created
millions of articles in various languages on Wikipedia, have mostly worked
along these lines.
For a great example of how far the model approach can be pushed, consider
Magnus Manske’s Reasonator <https://meta.wikimedia.org/wiki/Reasonator>,
which, based on the data in Wikidata, creates the following automatic
description for Douglas Adams <https://reasonator.toolforge.org/?q=Q42>:
*Douglas Adams* was a British playwright, screenwriter, novelist,
children's writer, science fiction writer, comedian, and writer. He was
born on March 11, 1952 in Cambridge to Christopher Douglas Adams and Janet
Adams. He studied at St John's College from 1971 until 1974 and Brentwood
School from 1959 until 1970. His field of work included science fiction,
comedy, satire, and science fiction. He was a member of Groucho Club and
Footlights. He worked for The Digital Village from 1996 and for BBC. He
married Jane Belson on November 25, 1991 (married until on May 11, 2001 ),
Jane Belson on November 25, 1991 (married until on May 11, 2001 ), and Jane
Belson on November 25, 1991 (married until on May 11, 2001 ). His children
include Polly Adams, Polly Adams, and Polly Adams. He died of myocardial
infarction on May 11, 2001 in Santa Barbara. He was buried at Highgate
If we were to say that this is merely better than nothing, I think we would
undersell the achievement of Reasonator. The above text, together with the
appealing display of the structured data in Reasonator, leads to a more
comprehensive access to knowledge than many of the individual language
Wikipedias provide for Douglas Adams. For comparison, check out the
articles in Azery <https://az.wikipedia.org/wiki/Duqlas_Adams>, Urdu
or Danish <https://da.wikipedia.org/wiki/Douglas_Adams>. At the same time,
it shows errors that most contributors wouldn’t know how to fix (such as
the repetition of the names of the children, or the spaces inside the
The Article placeholder
<https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder> project has
partially fulfilled the role of filling content gaps, but the developers
have intentionally shied away from the results looking too much like an
article. They display structured data from Wikidata within the context of a
language Wikipedia. For example, here is the generated page about
*triceratops* in Haitian Creole
One large disadvantage of using bots to create articles in Wikipedia has
been that this content was mostly controlled by a very small subset of the
community — often a single person. Many of the bots and datasets have not
been open sourced in a way that someone else could easily come in, make a
change, and re-run the bot. (Reasonator avoids this issue, because the text
is generated dynamically and is not incorporated into the actual Wikipedia
With Wikifunctions and Wikidata, we will be able to give control over all
these steps to the wider community. Both the models and the data will be
edited on wiki, with all the usual advantages of having a wiki: there is a
clear history, everyone can edit through the Web, people can discuss, *etc.*.
The data used to populate the models will be maintained in Wikidata, and
the models themselves in Wikifunctions. This will allow us to collaborate
on the texts, unleash the creativity of the community, spot and correct
errors and edge cases together, and slowly extend the types of items and
the coverage per type.
In a follow-up essay, we will discuss a different approach to creating
abstract content, where the content is not the result of a model based on
the type of the described item, but rather a manually constructed article,
built up sentence by sentence.
*Development update from the week of May 27:*
- The team had a session at Hackathon, which was well attended (about 30
people). Thanks to everyone for being there and your questions and comments!
- We also had follow-up meetings with User:Mahir256, to improve
alignment on the NLG stream
- Below is the brief weekly summary highlighting the status of each
- Observability document drafted.
- Updated Helm charts for getting function-* services in staging.
- Completed performance metrics design and shared for review
- Scoped out necessary changes to Wikifunctions post-launch
- Started recording and passing up some function-evaluator timing
metrics to the orchestrator
- WikiLambda (PHP) layer has been migrated to the new format of
- Improved the mobile experience of the function view page
- Transitioned the Tabs component to use Codex's, thanks to the
Design Systems Team.
- Design: Carried out end-to-end user flow testing in Bangla.
*(Apologies for this update being late. We plan to send out another update
The on-wiki version of this newsletter is here:
Design researcher Jeff Howard did another round of research
order to prioritize issues in the run up to launch, and beyond. The full
report of the user research has been published
Meta. Aishwarya, the designer on the Abstract Wikipedia team, has read and
analysed the research results, and summarized them in a slide deck
This week, Aishwarya presented the deck to the team, and we are offering
here a short summary of the presentation.
The goal for designing the function page is two-fold: to be understandable
and usable to technical people of all backgrounds, and welcoming to people
with low levels of programming expertise. Technical contributors should
understand the function creation workflow and the Wikifunctions mental
model. Seven technical participants were interviewed using Aishwarya’s
designs in Figma
anywhere on the screen to progress through the slides and remember to
expand your window).
The interviewees raised many great questions, validated a lot of our design
work, and identified several areas for improvement. Overall, the report
validated that we have met the stated design goals of the user interface
being understandable and usable for technical people, but the report also
highlighted that the contributors did not really understand the function
creation workflow and the general Wikifunctions mental model. In short,
they could get everything done, but were often confused about what they
were doing and why it was presented in that way.
I will not go into the many things that worked out well. You can read about
them in the full report and also in the slides. I do want to call out the
praise for the work summary diagram, which is consistent with many other
reactions we also got in the chat and in other interactions with Wikimedia
community members. I also want to use the chance to congratulate Aishwarya
on her design work, and seeing it validated so positively. We are all very
much looking forward to getting the implemented design out there for you to
play with, and learning more about how we can improve it.
Two points were called out by the interviewees in particular as causes of
surprise or confusion: the split between function definitions and their
implementations, and the multi-lingual nature of Wikifunctions.
In Wikifunctions, we allow each function to have several implementations
achieve this by having implementations be their own pages on Wikifunctions.
Such a separation is not a novel concept: programming languages such as C++
or Ada had header and implementation files for decades, and object oriented
languages <https://en.wikipedia.org/wiki/Object-oriented_programming> have
interfaces that can be implemented by different classes. But interviewees
have repeatedly wanted to jump right into providing the implementation.
They were confused that they could publish a function's definition even
before having provided an implementation. This was also a request we have
seen in previous user tests.
As a side remark, the little word *“publish”* really did a lot of heavy
lifting here. A long time ago, Wikipedia used to use the word *“save”* for
the button that let the contributor store an edit, and this was changed to
*“publish”* in 2017, based on user research that found wiki users surprised
and alarmed that merely 'saving' an edit would put it online, in public,
for everyone to see, forever. This user study reiterated the point that the
word *“publish”* makes it clear that the contribution will indeed go live
to the whole world. But at the same time, several interviewees felt that
just a function definition, without any implementations yet, didn’t seem to
be useful to be published. The word *“publish”* really brought out that
contrast, and helped us identify this discrepancy in the user’s mental
The second point that raised quite strong reactions was the multi-lingual
nature of Wikifunctions. That is one of the points that is often questioned
in the design of Wikifunctions, often unprompted: why does it have to be
multi-lingual? Why labels in different languages? Doesn’t everyone who
wants to code just learn English? To quote one of the interviewees, *“usually
people who speak other languages are just expected to learn English to
Because the world of coding is indeed so English-centered, it is very
difficult to find people with coding experience who don’t speak basic
English, and indeed all interviewed contributors spoke English.
There have been a number of research studies showing that the
English-centricity of programming is a major barrier
<https://dl.acm.org/doi/abs/10.1145/3173574.3173970> for many people.
People who can use their own language to code achieve results faster
<https://dl.acm.org/doi/abs/10.1145/3051457.3051464>. For parents that
don’t speak English, it is more difficult to help their children
<https://dl.acm.org/doi/abs/10.1145/3173574.3174196> to learn programming.
Based on these and other research results, we choose to intentionally
deviate from the recommendations of our own user research, as we believe
that this aligns better with the Wikimedia 2030 movement strategy
particularly towards knowledge equity.
There were many smaller, but very good points raised. The contributors
asked for a space to describe the functions in more detail (that’s planned
for Phase ι
which is up next in our development plan). The term *“aliases”* confused
users. The list of types was too simple. The example table was identified
as a place that probably won’t scale for complex entries. The difference
between the words *“available”* and *“proposed”* and *“verified”* in the
tables showing implementations and testers was confusing. And there were
quite a few more.
We also identified a number of larger areas that could be improved: making
the use of language more consistent throughout, displaying more meta-data
immediately, and improving the text to make the distinction between
definitions and implementations clearer. We are going to work on these
We are relieved and pleased to see that the designs allowed all the
contributors to fulfill their tasks. We are more than excited to implement
these designs, and get them to you. We would love to hear from you, if you
have ideas or suggestions around the issues discussed here, or in the full
Thanks to all the contributors who were interviewed, thanks to Jeff for
performing the research, and thanks to Aishwarya for summarizing the
Updates as of June 3: Fix-it week
- May 30 – June 3 was a ‘Fix-it’ week for the Abstract Wikipedia team.
During this week, the team paused the development of new features and
focused on tasks related to technical debt.
- Design update: This week, the team kicked off the design work for the
The on-wiki version is available here:
Our Google.org fellow, Ariel Gutman
<https://meta.wikimedia.org/wiki/User:AGutman-WMF>, has recently authored a
proposal of an architecture for the NLG system
The proposed architecture is driven by 4 main tenets:
1. *Modularity*: the system should be modular, in that various aspects
of NLG (e.g. morphosyntactic and phonotactic rules) can be modified
2. *Lexicality*: the system should be able to both fetch lexical data
(separate from code), and rely on productive language rules to generate
such data on the fly (e.g. inflecting English plurals with an -s).
3. *Recursivity*: due to the compositional and recursive nature of most
languages, an effective NLG system would need to be recursive itself.
4. *Extensibility*: the system should be receptive to extension both by
linguistic experts and technical contributors, as well as by non-technical
and non-expert contributors, working on different parts of the system.
These considerations lead to a proposal of a "pipeline" system, in which an
input Constructor is being processed by different modules (corresponding to
various aspects of natural language) until the final output text is
[image: A proposal of an NLG architecture for Abstract Wikipedia.svg]
In this pipeline dark blue forms are elements which would be created by
contributors to Wikifunctions (rectangles) or Wikidata (rounded
rectangles), while the light blue elements represent function or data
living within the Wikifunctions orchestrator.
A key aspect of the system are the "templatic renderers". Wikifunctions
will provide a specialized *templating language*, developed in-house, which
should enable even non-technical contributors to write renderers for their
language. These renderers will be supported by lexical data from Wikidata
and Universal Dependency-style grammatical relations, which would be
defined within Wikifunctions by linguistically-interested contributors.
We will be glad to hear any feedback from you on the proposal's talkpage
in particular about the idea to develop an in-house templating system.
Further updates for last week:
- This week, the team held its first Deep Dive session. We presented our
project OKRs and received feedback from leadership
- The team spent time this week preparing for last weekend's Hackathon:
- There was a presentation and Q&A about Wikifunctions
- A few Phabricator backlog tasks were identified and tagged for
Below is the brief weekly summary highlighting the status of each
- Made progress on Beta cluster setup: orchestrator and evaluator
services now update automatically to the latest image
- Completed the initial draft of the NLG system architecture design
- Partially completed the front-end code to accommodate both forwards
and backwards compatibility for the old & new metadata formats
- Made more progress for function view and editor implementations for
- Completed function-schemata migration to Benjamin arrays
- Handed off designs for 'Text with fallback'