The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-08-19
--
Our Google.org fellow, Ariel Gutman https://meta.wikimedia.org/wiki/User:AGutman-WMF, together with Prof. Maria Keet http://www.meteck.org/, who is devoting part of her sabbatical year to work with the Abstract Wikipedia's Natural Language Generation workstream, have recently authored a detailed specification of a template language https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions. This aims to allow Wikifunctions contributors to easily create renderers https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#renderer of abstract content https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#Content. For instance, entity Q7259 https://www.wikidata.org/wiki/Q7259 has property P106 https://www.wikidata.org/wiki/Property:P106 pointing to Q5482740 https://www.wikidata.org/wiki/Q5482740 asserted in Wikidata, and with all the machinery in place, it may render as, e.g., *“Ada Lovelace was a programmer.”* The template language seeks to assist with specifying the structure for generating sentences so that the structured content will be displayed as text in a natural language of one’s choice.
You may recall from the architecture proposal https://meta.wikimedia.org/wiki/Abstract_Wikipedia/NLG_system_architecture_proposal that every constructor https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#Constructor (which typically aims to capture the meaning of a single phrase or sentence structure) will be matched with a specific template to render that constructor as text. The templates will reside in Wikifunctions, and will be parsed into Composition https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Glossary#composition syntax, so that it can act as a Renderer. An initial version of this parser has already been implemented https://gerrit.wikimedia.org/r/c/mediawiki/tools/wikilambda-cli/%2B/814724 as part of the Wikifunctions CLI tool https://www.mediawiki.org/wiki/Extension:WikiLambda/CLI, which you can toy around with.
What do these templates look like? A template is a combination of text and slots, where slots can refer to other templates or functions from Wikifunctions, allowing for dynamic content. The specification of grammatical constraints is done through dependency relations (using, for instance, the UD https://universaldependencies.org/ formalism for grammar annotations) specified as labels within the slots. As for the text, it may represent static text, which will be kept untouched throughout the rendering, or it may represent lexemes that can assume different forms according to the neighboring syntactic and phonological constraints.
For starters, let's look at an example template to generate a sentence describing the age of a person, e.g. *"Dan is 20 years old."*, given a constructor with two fields: entity (the Q-id of the person) and years (the age). In English, this template may look like this:
{Person(entity)} is {nummod:Cardinal(years)} {root:Lexeme(L2505)} old.
There are three slots, which are delimited by curly brackets:
1. {Person(entity)} resolves to the name of the person. 2. {nummod:Cardinal(years)} resolves to the number of years. It is marked as the "*num*eral *mod*ifier" of the third slot. 3. {root:Lexeme(L2505)} fetches from Wikidata Lexeme L2505 https://www.wikidata.org/wiki/Lexeme:L2505, which refers to the lemma "year". Since the slot is marked as root, it will be linked to the previous slot, allowing for the selection of the right form of the lexeme: "year" or "years".
The remaining text in the template – "is" and "old" – is in this case static text. In other cases, we might need to specify that the verb is can inflect as well or the number may need some additional processing to render it properly, and we would use similar dependency labels to mark subject-verb agreement and other types of agreement across the sentence’s constituents.
In the document, similar examples https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions#Example_templates – though more complex – are given for 4 other languages (Swedish, French, Hebrew and isiZulu), each presenting its own peculiarities and challenges but that still can be captured successfully with the proposed template language. We invite you to read the document, provide feedback and try to come up with challenging examples in other languages that may prove difficult to render using this formalism, so we can improve on it and achieve the broadest possible applicability to, ideally, all natural languages used. Wikimania video
Last week was Wikimania 2022 https://meta.wikimedia.org/wiki/Wikimania_2022, the annual event for Wikimedians from all over the world to meet and discuss. There were two sessions on Wikifunctions, one session on Wikifunctions https://wikimania.wikimedia.org/wiki/2022:Submissions/Wikifunctions_-_A_new_Wikimedia_project lead by the team and one on Ninai and Udiron lead by Mahir Morshed https://wikimania.wikimedia.org/wiki/2022:Submissions/Ninai-Udiron:_Using_Wikidata_Items_and_Lexemes_for_Abstract_Wikipedia-Like_Text_Generation .
Our session consisted of a short introduction to Wikifunctions by Denny, followed by a pre-recorded section, where several team members had short deep dives into different topics.
We had:
- James Forrester on the technical architecture - Amin Al Hazwani on the design language - Genoveva Galarza Heredero on the content model - Julia Kieserman on Codex - Cory Massaro on knowledge equity - Ariel Gutman on natural language generation, an intro to the first part of the newsletter above - Ali Assaf on formalizing the function model
You can watch this pre-recorded segment on Commons https://commons.wikimedia.org/wiki/File:Wikimania_2022_Wikifunctions_HD.webm .
As with all Wikimania sessions, collaborative note taking was enabled. The notes on the session https://wikimania.wikimedia.org/wiki/2022:Submissions/Wikifunctions_-_A_new_Wikimedia_project#Session_notes_from_Etherpad also contain all questions that have been asked and answered in the closing part of the session, following the video. A full video of the session is available on YouTube https://www.youtube.com/watch?v=Zasie41p1-U?t=35278s, but note that playing the pre-recorded video faced a number of technical issues. You might want to skip to the Commons video instead. Uploads of individual sessions are expected to become available later.
Mahir Morshed had a Wikimania session about Ninai and Udiron https://wikimania.wikimedia.org/wiki/2022:Submissions/Ninai-Udiron:_Using_Wikidata_Items_and_Lexemes_for_Abstract_Wikipedia-Like_Text_Generation and the recording starts here https://www.youtube.com/watch?v=BCRi1VRtQXE&t=3431s. Ninai and Udiron are tools for natural language generation, and we have introduced them in earlier newsletters https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-03. Workstream updates as of August 12, 2022
Performance
- Started performance analysis methodology documentation - Set up health-check API endpoint for Wikilambda
Natural language generation
- Not too much progress due to team members' vacation time. Started adding noun class information for isiZulu, Mboshi, Kiswahili
Meta-data
- Finished display of metadata dialog on tester page - Created some new PHP utilities for ZMaps
Experience
- Fixed and merged Beta launch blockers - Made great progress on fixing various bugs - Began researching diffing options