The on-wiki version is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-27
--
Our
Google.org fellow, Ariel Gutman
<https://meta.wikimedia.org/wiki/User:AGutman-WMF>, has recently authored a
proposal of an architecture for the NLG system
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/NLG_system_architecture_proposal>
of
Abstract Wikipedia.
The proposed architecture is driven by 4 main tenets:
1. *Modularity*: the system should be modular, in that various aspects
of NLG (e.g. morphosyntactic and phonotactic rules) can be modified
independently.
2. *Lexicality*: the system should be able to both fetch lexical data
(separate from code), and rely on productive language rules to generate
such data on the fly (e.g. inflecting English plurals with an -s).
3. *Recursivity*: due to the compositional and recursive nature of most
languages, an effective NLG system would need to be recursive itself.
4. *Extensibility*: the system should be receptive to extension both by
linguistic experts and technical contributors, as well as by non-technical
and non-expert contributors, working on different parts of the system.
These considerations lead to a proposal of a "pipeline" system, in which an
input Constructor is being processed by different modules (corresponding to
various aspects of natural language) until the final output text is
rendered.
[image: A proposal of an NLG architecture for Abstract Wikipedia.svg]
<https://meta.wikimedia.org/wiki/File:A_proposal_of_an_NLG_architecture_for_Abstract_Wikipedia.svg>
In this pipeline dark blue forms are elements which would be created by
contributors to Wikifunctions (rectangles) or Wikidata (rounded
rectangles), while the light blue elements represent function or data
living within the Wikifunctions orchestrator.
A key aspect of the system are the "templatic renderers". Wikifunctions
will provide a specialized *templating language*, developed in-house, which
should enable even non-technical contributors to write renderers for their
language. These renderers will be supported by lexical data from Wikidata
and Universal Dependency-style grammatical relations, which would be
defined within Wikifunctions by linguistically-interested contributors.
We will be glad to hear any feedback from you on the proposal's talkpage
<https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/NLG_system_architecture_proposal>,
in particular about the idea to develop an in-house templating system.
Further updates for last week:
- This week, the team held its first Deep Dive session. We presented our
project OKRs and received feedback from leadership
- The team spent time this week preparing for last weekend's Hackathon:
- There was a presentation and Q&A about Wikifunctions
- A few Phabricator backlog tasks were identified and tagged for
Hackathon participants
Below is the brief weekly summary highlighting the status of each
workstream:
- Performance:
- Made progress on Beta cluster setup: orchestrator and evaluator
services now update automatically to the latest image
- NLG:
- Completed the initial draft of the NLG system architecture design
document
- Metadata:
- Partially completed the front-end code to accommodate both forwards
and backwards compatibility for the old & new metadata formats
- Experience:
- Made more progress for function view and editor implementations for
mobile
- Completed function-schemata migration to Benjamin arrays
- Handed off designs for 'Text with fallback'