The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-08-19
--
Our
Google.org fellow, Ariel Gutman
<https://meta.wikimedia.org/wiki/User:AGutman-WMF>, together with Prof. Maria
Keet <http://www.meteck.org/>, who is devoting part of her sabbatical year
to work with the Abstract Wikipedia's Natural Language Generation
workstream, have recently authored a detailed specification of a template
language
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions>.
This aims to allow Wikifunctions contributors to easily create renderers
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#renderer>
of abstract
content
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#Content>. For
instance, entity Q7259 <https://www.wikidata.org/wiki/Q7259> has property
P106 <https://www.wikidata.org/wiki/Property:P106> pointing to Q5482740
<https://www.wikidata.org/wiki/Q5482740> asserted in Wikidata, and with all
the machinery in place, it may render as, e.g., *“Ada Lovelace was a
programmer.”* The template language seeks to assist with specifying the
structure for generating sentences so that the structured content will be
displayed as text in a natural language of one’s choice.
You may recall from the architecture proposal
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/NLG_system_architecture_proposal>
that
every constructor
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#Constructor>
(which
typically aims to capture the meaning of a single phrase or sentence
structure) will be matched with a specific template to render that
constructor as text. The templates will reside in Wikifunctions, and will
be parsed into Composition
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#composition>
syntax,
so that it can act as a Renderer. An initial version of this parser has
already been implemented
<https://gerrit.wikimedia.org/r/c/mediawiki/tools/wikilambda-cli/%2B/814724> as
part of the Wikifunctions CLI tool
<https://www.mediawiki.org/wiki/Extension:WikiLambda/CLI>, which you can
toy around with.
What do these templates look like? A template is a combination of text and
slots, where slots can refer to other templates or functions from
Wikifunctions, allowing for dynamic content. The specification of
grammatical constraints is done through dependency relations (using, for
instance, the UD <https://universaldependencies.org/> formalism for grammar
annotations) specified as labels within the slots. As for the text, it may
represent static text, which will be kept untouched throughout the
rendering, or it may represent lexemes that can assume different forms
according to the neighboring syntactic and phonological constraints.
For starters, let's look at an example template to generate a sentence
describing the age of a person, e.g. *"Dan is 20 years old."*, given a
constructor with two fields: entity (the Q-id of the person) and years (the
age). In English, this template may look like this:
{Person(entity)} is {nummod:Cardinal(years)} {root:Lexeme(L2505)} old.
There are three slots, which are delimited by curly brackets:
1. {Person(entity)} resolves to the name of the person.
2. {nummod:Cardinal(years)} resolves to the number of years. It is
marked as the "*num*eral *mod*ifier" of the third slot.
3. {root:Lexeme(L2505)} fetches from Wikidata Lexeme L2505
<https://www.wikidata.org/wiki/Lexeme:L2505>, which refers to the lemma
"year". Since the slot is marked as root, it will be linked to the
previous slot, allowing for the selection of the right form of the lexeme:
"year" or "years".
The remaining text in the template – "is" and "old" – is in this case
static text. In other cases, we might need to specify that the verb is can
inflect as well or the number may need some additional processing to render
it properly, and we would use similar dependency labels to mark
subject-verb agreement and other types of agreement across the sentence’s
constituents.
In the document, similar examples
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions#Example_templates>
–
though more complex – are given for 4 other languages (Swedish, French,
Hebrew and isiZulu), each presenting its own peculiarities and challenges
but that still can be captured successfully with the proposed template
language. We invite you to read the document, provide feedback and try to
come up with challenging examples in other languages that may prove
difficult to render using this formalism, so we can improve on it and
achieve the broadest possible applicability to, ideally, all natural
languages used.
Wikimania video
Last week was Wikimania 2022
<https://meta.wikimedia.org/wiki/Wikimania_2022>, the annual event for
Wikimedians from all over the world to meet and discuss. There were two
sessions on Wikifunctions, one session on Wikifunctions
<https://wikimania.wikimedia.org/wiki/2022:Submissions/Wikifunctions_-_A_new_Wikimedia_project>
lead
by the team and one on Ninai and Udiron lead by Mahir Morshed
<https://wikimania.wikimedia.org/wiki/2022:Submissions/Ninai-Udiron:_Using_Wikidata_Items_and_Lexemes_for_Abstract_Wikipedia-Like_Text_Generation>
.
Our session consisted of a short introduction to Wikifunctions by Denny,
followed by a pre-recorded section, where several team members had short
deep dives into different topics.
We had:
- James Forrester on the technical architecture
- Amin Al Hazwani on the design language
- Genoveva Galarza Heredero on the content model
- Julia Kieserman on Codex
- Cory Massaro on knowledge equity
- Ariel Gutman on natural language generation, an intro to the first
part of the newsletter above
- Ali Assaf on formalizing the function model
You can watch this pre-recorded segment on Commons
<https://commons.wikimedia.org/wiki/File:Wikimania_2022_Wikifunctions_HD.webm>
.
As with all Wikimania sessions, collaborative note taking was enabled.
The notes
on the session
<https://wikimania.wikimedia.org/wiki/2022:Submissions/Wikifunctions_-_A_new_Wikimedia_project#Session_notes_from_Etherpad>
also
contain all questions that have been asked and answered in the closing part
of the session, following the video. A full video of the session is
available on YouTube <https://www.youtube.com/watch?v=Zasie41p1-U?t=35278s>,
but note that playing the pre-recorded video faced a number of technical
issues. You might want to skip to the Commons video instead. Uploads of
individual sessions are expected to become available later.
Mahir Morshed had a Wikimania session about Ninai and Udiron
<https://wikimania.wikimedia.org/wiki/2022:Submissions/Ninai-Udiron:_Using_Wikidata_Items_and_Lexemes_for_Abstract_Wikipedia-Like_Text_Generation>
and
the recording starts here
<https://www.youtube.com/watch?v=BCRi1VRtQXE&t=3431s>. Ninai and Udiron are
tools for natural language generation, and we have introduced them in earlier
newsletters
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-03>.
Workstream updates as of August 12, 2022
Performance
- Started performance analysis methodology documentation
- Set up health-check API endpoint for Wikilambda
Natural language generation
- Not too much progress due to team members' vacation time. Started
adding noun class information for isiZulu, Mboshi, Kiswahili
Meta-data
- Finished display of metadata dialog on tester page
- Created some new PHP utilities for ZMaps
Experience
- Fixed and merged Beta launch blockers
- Made great progress on fixing various bugs
- Began researching diffing options