The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-12-19 -- Evaluation of the project by the Google.org fellows
During the fellowship, the Google.org fellows gained detailed insight into the Wikifunctions and Abstract Wikipedia project. With the goal to point out potential issues and to discuss potential alternatives to some of the project’s approaches, they wrote a detailed evaluation of the Wikifunctions and Abstract Wikipedia projects.
The team read through the evaluation and wrote a detailed answer. We will take a lot of the suggestions of the fellows to heart and make sure to implement them. The evaluation and the answer also helped the team to gain a better shared understanding of the project.
We invite you to read both documents:
- Wikifunctions and Abstract Wikipedia: An Evaluation https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation - Answer to Wikifunctions and Abstract Wikipedia: An Evaluation https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation_-_Answer
Ariel’s goodbye letter
*At the end of the month, Ariel Gutman https://meta.wikimedia.org/wiki/User:AGutman-WMF, who joined the Abstract Wikipedia project as one of the Google.org fellows, will be leaving. He was contributing to the Natural Language Generation (NLG) workstream. We want to give him the opportunity to say goodbye with his own words. Thank you, Ariel!*
Over the last six months, I've been part of the Abstract Wikipedia team as a Google.Org fellow https://diff.wikimedia.org/2022/04/14/google-org-fellowship-with-abstract-wikipedia-and-wikifunctions/. At the Foundation, my aim was to leverage my expertise in Natural Language Generation, which I honed from working on NLG at Google for over six years, to advance the Abstract Wikipedia https://meta.wikimedia.org/wiki/Abstract_Wikipedia project.
The first half of the fellowship was mostly dedicated to writing design docs: The architecture of an NLG system https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-27 and a template language specification https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-08-19 (the latter co-authored with Maria Keet, to whom I’m grateful). At the same time I was involved in other discussions, be it the quality of lexical data on Wikidata https://commons.wikimedia.org/wiki/File:Using_Lexemes_in_Abstract_Wikipedia_-_Wikidata_Quality_Days_2022_Presentation.pdf, or the form Abstract Content https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wikidata_Abstract_Representation should take (many thanks to Kutz Arrieta for leading the latter discussion).
At the midpoint of the fellowship, I felt the urge to create something more concrete. Unfortunately, the Wikifunctions platform was not ready to serve as a solid development platform, so, per the advice of the Google.Org Tech Lead Ori Livneh https://diff.wikimedia.org/2022/11/23/meet-ori-livneh-google-org-fellow-returning-wikipedian/, I set out to create a prototype NLG system on Wikipedia’s Scribunto platform, a Lua-based scripting environment embedded within Wikipedia.
To my great pleasure, the Scribunto platform, with its Wikidata API https://www.mediawiki.org/wiki/Extension:WikibaseLexeme, allowed me to rapidly create a functional NLG system https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions/Scribunto-based_implementation#Template_parsing_and_evaluation capable of transforming Abstract Content into text (see recorded demo https://meta.wikimedia.org/wiki/File:Demo_of_a_Scribunto-based_templatic_NLG_system.webm or example output https://meta.wikimedia.org/wiki/User:AGutman-WMF/Template_Examples#Rendering_of_curated_Abstract_Content). The system is not yet exhaustive, however it contains the necessary components, outlined in the proposed architecture https://commons.wikimedia.org/wiki/File:A_proposal_of_an_NLG_architecture_for_Abstract_Wikipedia.svg :
- An Abstract Content https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/AbstractContent repository, allowing the specification of an article outline for individual Wikidata items. - A Constructors https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Constructors repository, containing logic for auto-creation of abstract content for Wikidata items, depending on their types (people, places etc.). - Templatic renderers https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Renderers which are templates specifying how each constructor should be verbalized in the different realization languages. - Template functions https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Functions written in Lua or in the template language, to be used within template slots. These in particular allow importing of Wikidata lexemes and their representation in an internal format, using dedicated helper modules. - Morphosyntactic dependency relations https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Relations written in Lua using a limited set of unification operators, allow specifying the flow of grammatical features between template elements. - Phonotactic functions https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Phonotactics written on Lua allow specification of language-specific phonotactic rules (such as the *a/an* alternation in English). - Text assembler https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/TextAssembler taking care of constructing the rendered text, while adjusting punctuation, spacing and capitalization.
On top of these there are modules with the necessary logic needed to parse https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/TemplateParser and evaluate https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/TemplateEvaluator templates, represent lexemes https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Lexemes and unifiable features https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/UnifiableFeatures and interact with Wikidata https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia/Wikidata. The main module https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia controls the overall flow of the NLG pipeline.
My primary aim in developing this prototype was to substantiate the designs I've proposed, and provide example code for a similar implementation on Wikifunctions. In fact, if Wikifunctions will support Lua, the code can probably be reused as-is. The modules in the above bulleted list would become user-editable functions, while those mentioned thereafter could be integrated in the backend system of Wikifunctions, as they are expected to be relatively stable.
Yet, there is a second, more subtle aim. During my fellowship, I have grown skeptical of the premise that Wikifunctions is necessary to achieve the vision of Abstract Wikipedia. While user contributions (e.g., functions, renderers, or constructors) are necessary for its success, these should be NLG-oriented and they do not need a general functional platform such as Wikifunctions. By focusing on building an NLG-oriented system, the vision of Abstract Wikipedia can more rapidly be attained. (Being part of a fellowship, it maybe shouldn’t come as a surprise that I'm on the "One Ring https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-11-04" side…). Together with my colleagues Ori Livneh, Ali Assaf and Mary Yang I've put my viewpoint in detailed writing https://docs.google.com/document/d/1qi59YMMan53syjDKqPdkFlmhyBYZ59mHRt_T6P-Hjho/edit#heading=h.8xxdlp39k478. I believe that the template-language proposal, implemented in this prototype, is the good foundation to build upon.
The Scribunto prototype shows that a platform more limited than Wikifunctions can already be used to generate articles from Abstract Content on real Wikipedias. It suffices to copy over the necessary modules to the target Wiki, and define the language specific renderers, functions and relations. Whether you agree with me or not, I invite you to play around with the system and edit the relevant modules to add functionality for your favorite language.
As my fellowship is ending https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-10-27, I would like to thank all my colleagues in Abstract Wikipedia's Natural Language Generation workstream, for the passionate discussions and ideas. In particular I am thankful to Cory Massaro, the Tech Lead of the workstream, for his guidance and confidence, and to Eunice Moon, my Google.Org colleague and Product Manager of the workstream, for her superb organizational skills. End of year break
We wish everyone a happy holiday and a Happy New Year 2023! We will take a break from writing updates until the week of January 13, 2023. Updates from Development (as of December 16, 2022)
From December 5 – 9 was a '*Fix-it' week* for the Abstract Wikipedia team. During this week, the team paused the development of new features and focused on tasks related to technical debt
The team also made a lot of progress in descoping work planned before the launch. A lot of items were removed from the scope of MVP.
In the week of December 11 to 16, the Abstract Wikipedia team participated in a small internal hackathon/collaboration, in order to get to know more areas of the codebase and our colleagues, and work on some assorted community wishlist entries. The team worked on projects including getting WhatLinksHere's lists in alphabetical order https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Miscellaneous/Get_WhatLinksHere%27s_lists_in_alphabetical_order, Wikisource User Research to inform the larger suggestion of the platform needing support, enabling negation for tag filters https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Search/Enable_negation_for_tag_filters , auto-suggesting linking Wikidata item after creating an article https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Wikidata/Autosuggest_linking_Wikidata_item_after_creating_an_article, and missing LaTeX capabilities for math rendering.
abstract-wikipedia@lists.wikimedia.org