The on-wiki version is available here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-27

--

Our Google.org fellow, Ariel Gutman, has recently authored a proposal of an architecture for the NLG system of Abstract Wikipedia.

The proposed architecture is driven by 4 main tenets:

  1. Modularity: the system should be modular, in that various aspects of NLG (e.g. morphosyntactic and phonotactic rules) can be modified independently.
  2. Lexicality: the system should be able to both fetch lexical data (separate from code), and rely on productive language rules to generate such data on the fly (e.g. inflecting English plurals with an -s).
  3. Recursivity: due to the compositional and recursive nature of most languages, an effective NLG system would need to be recursive itself.
  4. Extensibility: the system should be receptive to extension both by linguistic experts and technical contributors, as well as by non-technical and non-expert contributors, working on different parts of the system.

These considerations lead to a proposal of a "pipeline" system, in which an input Constructor is being processed by different modules (corresponding to various aspects of natural language) until the final output text is rendered.

A proposal of an NLG architecture for Abstract Wikipedia.svg

In this pipeline dark blue forms are elements which would be created by contributors to Wikifunctions (rectangles) or Wikidata (rounded rectangles), while the light blue elements represent function or data living within the Wikifunctions orchestrator.

A key aspect of the system are the "templatic renderers". Wikifunctions will provide a specialized templating language, developed in-house, which should enable even non-technical contributors to write renderers for their language. These renderers will be supported by lexical data from Wikidata and Universal Dependency-style grammatical relations, which would be defined within Wikifunctions by linguistically-interested contributors.

We will be glad to hear any feedback from you on the proposal's talkpage, in particular about the idea to develop an in-house templating system.

Further updates for last week:

Below is the brief weekly summary highlighting the status of each workstream: