June 2022 - Abstract-Wikipedia - lists.wikimedia.org

Newsletter #76: Manually-written articles
by Denny Vrandečić 01 Jul '22

01 Jul '22

The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-21 -- Communities will create (at least) two different types of articles using Abstract Wikipedia: on the one hand, we will have highly-standardised articles based entirely on Wikidata, called model articles; and on the other hand, we will have bespoke, hand-crafted content, assembled sentence by sentence. Today we will discuss the second type, after we discussed the first type, model articles, in a previous newsletter <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-07>. Both types, by the way, can be implemented by the "templatic renderers" concept that is part of Ariel Gutman’s proposal <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-27>. We will also dedicate a future newsletter to a comparison of the two types. For manually-assembled articles, we have to make many more assumptions about what will eventually be available in Wikifunctions than we do for model-based articles. The following description is not meant to prescribe to the community how things should work, but provides just the sketch of a possibility. It is based on a "Wizard of Oz experiment" <https://en.wikipedia.org/wiki/Wizard_of_Oz_experiment> we did during our recent Abstract Wikipedia team offsite <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-20>. We took the first sentence from a semi-randomly chosen article, with the aim to handcraft the representation of said sentence in Abstract Wikipedia. It's often harder to see how to translate articles about ideas than more concrete things like people, places, and objects. The sentence came from the English Wikipedia article Profit (economics) <https://en.wikipedia.org/wiki/Profit_(economics)>, which we picked as a common example of a concept: An economic profit is the difference between the revenue a commercial entity has received from its outputs and the opportunity costs of its inputs. Note that we do not expect that English Wikipedia will be the source for all articles for Abstract Wikipedia, but it is certainly a convenient source of inspiration for the team, given that all of us speak English. As a baseline, we each manually translated that text into the languages we speak. One powerful, if not the most powerful tool in our arsenal towards turning this sentence into abstract content is that we can rewrite and simplify it. In Abstract Wikipedia the goal is not to translate as faithfully as possible the wording of any existing Wikipedia articles, but to capture as much as possible of the meaning of the articles. So we took the freedom to rewrite the sentence as follows: In economics, the profit of a commercial entity is defined as the difference between its outputs’ revenue and its inputs’ opportunity cost. We further reduced the sentence, due to time constraints, as simply: In economics, profit is defined as the difference between revenue and cost. We then from this assembled the following abstract content. *Context* - *context*: economics <https://www.wikidata.org/wiki/Q8134> - *content*: *Definition* - *subject*: profit <https://www.wikidata.org/wiki/Q26911> - *definition*: *Difference* - *first*: income <https://www.wikidata.org/wiki/Q1527264> - *second*: operating cost <https://www.wikidata.org/wiki/Q831940> Here, the bold text is the label of a constructor, the italic text is the label of a key of the given constructor, and the link points to a Wikidata item. This follows the notation used in previous examples. Just as with previous examples, we assume the availability of the used constructors. To be explicit, in this case we assume the constructors listed below with their respective keys. How the keys or constructors would be named, and in fact, which constructors and keys would even exist, might very well be very different. *Context* returns a full clause representing a subordinate clause being put in a context - *context* take a noun phrase, describing the context in which the content is - *content* takes a clause that is being put in the context *Definition* returns a full clause defining something as a definition - *subject* takes a noun phrase that is being defined - *definition* takes a noun phrase that represents the definition *Difference* returns a noun phrase that means the quantitative difference between two given noun phrases - *first* takes a noun phrase that represents the first part - *second* takes a noun phrase that represents the second part Where we have mentioned "noun phrase" above, we actually mean "concept that can be realized as a noun phrase by a renderer". Also, we have glossed over the considerable challenge of having a mechanism through which a renderer could just take in a Wikidata item and turn it into a noun phrase. That is a challenge that Mahir has tackled admirably with Ninai and Udiron <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-03>. Another challenge was to find the right Wikidata items for each of the involved noun phrases. For example, for the second key of the Difference constructor, we chose operating cost <https://www.wikidata.org/wiki/Q831940>. Other candidates could have been cost <https://www.wikidata.org/wiki/Q240673> or opportunity cost <https://www.wikidata.org/wiki/Q185715>. Again, this is not necessarily the best choice, but just the one we came up with, given our time constraints and the way we approached the task. The final step of the exercise was to take that abstract content, and to render (by hand) a natural language text in the languages that we speak, as mechanically as possible, using the labels of the selected Wikidata items (it should be the lexeme connected to the items, but that was too sparse). This step is why we called the whole exercise a “Wizard of Oz” exercise, as we simulate here what renderers in Wikifunctions would do. Here are some results (unfortunately, we didn’t record the results we came up with during the offsite, so we re-created them for this newsletter): *English*: In economics, economic profit is defined as the difference between income and operating cost. *German*: In Wirtschaftswissenschaft ist Gewinn definiert als der Unterschied zwischen Einkommen und Betriebskosten. *Croatian*: U ekonomiji, dobit je definiran kao razlika između dohodka i troška*. *Russian*: В экономике, экономическая прибыль определяется как разница между доходом и операционными затратами. *French*: En économie, le profit est défini comme la différence entre les revenus et les dépenses d'exploitation. *Spanish*: En economía, ganancia económica se define como la diferencia entre ingresos y costes*. *Kannada*: ಅರ್ಥಶಾಸ್ತ್ರದಲ್ಲಿ, ಆರ್ಥಿಕ ಲಾಭವನ್ನು ಆದಾಯ ಮತ್ತು ನಿರ್ವಹಣಾ ವೆಚ್ಚದ ನಡುವಿನ ಅಂತರವೆಂದು ವ್ಯಾಖ್ಯಾನಿಸಲಾಗಿದೆ. *Chinese*: 在经济学中，经济利润被定义为收入与经营成本之间的差额。 *Hebrew*: בכלכלה, רווח מוגדר כהפרש בין הכנסה להוצאות תפעוליות. *Swedish*: I nationalekonomi definieras vinst som skillnaden mellan inkomst och Opex. *Italian*: In economia, il profitto è definito come la differenza fra il reddito e i costi operativi*. *Arabic*: في الاقتصاد*، يتم تعريف الربح على أنه الفرق بين الدخل المالي والمصروفات الجارية. Words marked with an asterisk were given manual translations from us, as they did not at the time have a label in Wikidata, or the label did not fit. During the offsite, we evaluated the results, and found them in fact not only readable (although not perfect), but also easier to understand than our initial translation. This is likely an effect of the simplification process the text underwent. The whole exercise left us filled with optimism about the approach. *This newsletter was late due to the amount of discussion it generated internally. Don’t expect everyone on the team to agree on everything being said here. We think these discussions should be in the open, for everyone to join in. Expect more to follow.* *Further updates:* We are getting additional support from ThisDot technical writers: Two ThisDot technical writers will be joining the team for the remainder of June to figure out how to on-board users into the concept of functions, and how to communicate to users what functions are and how they work, in an easily-translatable manner. Below is the brief weekly summary highlighting the status of each workstream Performance: - Drafted the Performance Metrics document - Started research on reported slowness in function evaluation - Added logging and dashboarding to Beta Cluster and wrote documentation for Beta Cluster NLG: - Wrote a Proof of Concept of support for new Wikifunctions features to support proposed NLG pipelines Meta-data: - Altered MediaWiki PHP and Vue layers to handle either format - Ensured that no function-orchestrator test code/cases employ the old format Experience: - WikiLambda PHP and Function-schemata finished and merged - Design: continue working on typed list view - Front-end: made ISO codes mobile friendly and started table component implementation

3 3

Newsletter #77: Thank you, Aishwarya! And we are hiring
by Denny Vrandečić 01 Jul '22

01 Jul '22

The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-30 -- Last year, we welcomed Aishwarya Vardhana <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-04-21> to the team. This week is her last week on the Abstract Wikipedia project, as the Design group at the Wikimedia Foundation is rotating some of their designers between teams. During her work on Wikifunctions, Aishwarya has brought novel perspectives and points of view to the project, which will shape Wikifunctions for years to come. Her design work, which was frequently featured in this newsletter, and her guidance in diligently testing crucial components of Wikifunctions, will lead to an immensely improved product. Her voice and her work towards anchoring the values of Wikifunctions in diversity, equity, and inclusion will have a lasting impact on the whole project. It is a pleasure working with Aishwarya, and the whole team is sad to see her go. Fortunately, Aishwarya is staying with the Wikimedia Foundation and will take over as designer on the Trust and Safety Tools team; we'll introduce our new design colleague in a future newsletter. Aiswharya just recently summarized many of the UX research results <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-10> in a newsletter of its own. We are very thankful for her contributions, and we congratulate her on her new role. Here are some words from Aishwarya. Namaste Abstract Wikipedia community! This project is near and dear to me and will always hold a special place in my heart. Thanks to all of you for your commitment to this important, decolonial <https://en.wikipedia.org/wiki/Decoloniality> effort. A question that I wrestled with throughout my time designing Wikifunctions has been, who is Wikifunctions for? Will it truly be a diverse and equitable community? Will it be an ecosystem based in mutuality and trust? Might this audience be different from all the other Wikimedia projects by embracing diversity, equity, and inclusion from day one? The answers to these questions must come from all of us. Each one of us that engages with the project, who writes feedback or an implementation, asks questions, or approves testers, is a steward for these values. As I depart the team and transition into a volunteer, I have faith in us as a collective. See you on the internet! You can follow Aishwarya’s writing on Medium <https://aishwaryavardhana.medium.com/> or on thewildword.com. ------------------------------ *We are hiring!* The Wikimedia Foundation is hiring for a Staff Software Engineer as a Quality and Test Engineer <https://boards.greenhouse.io/wikimedia/jobs/4321901>! Wikifunctions and Abstract Wikipedia are complex systems, and we need help in order to improve the reliability and the development velocity of our system. We are looking for someone to develop and set up an environment that will allow our engineers to write tests, from effective unit tests to integration tests to end to end tests; to ensure that our tests are run during continuous integration; and to allow for high-quality rollouts of new features to Wikifunctions. If you are interested, please apply, or if you know someone with the relevant experience and interest, please let them know. ------------------------------ *Workstream updates (as of June 17)* Performance: - Shared the Performance Metrics document with SRE for approval - Aligned on scope for Metadata and Performance workstreams - Progressed migration of the tester pipeline from orchestrator into MediaWiki NLG: - Work in progress: finalizing the set of deliverables and goals for this workstream - Prepared for Wikidata Quality Days presentation Meta-data: - Altered function-orchestrator and MediaWiki PHP API to respond with a map object Experience: - Aishwarya presented the Wikifunctions workflow at the Product Department monthly meeting - Cleanup tasks for function-schemata and wikilambda - Finished table component implementation - Completed basic implementation of tester and implementation tables - Handed off designs for typed list view

1 0

Ubsubsribe
by Lotje Lotje 25 Jun '22

25 Jun '22

Thank you

1 0

Newsletter #75: Model articles
by Denny Vrandečić 11 Jun '22

11 Jun '22

The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-07 -- Communities will create (at least) two different types of articles using Abstract Wikipedia: on the one hand, we will have highly-standardised articles based entirely on Wikidata; and on the other hand, we will have bespoke, hand-crafted content, assembled sentence by sentence. Today we will discuss the first type, and we will discuss the second type in an upcoming newsletter. Articles of the first type can be created very quickly and will likely constitute the vast majority of articles for a long time to come. For that we can use models, *i.e.* a text with variables. Put differently, a text with gaps which get filled from a different source such as a list, along the lines of the mad libs <https://en.wikipedia.org/wiki/Mad_Libs> game. A model can be created once for a specific type of item and then used for every single item of this type that has enough data in Wikidata. The resulting articles are similar to many bot-created articles that already exist in various Wikipedias. For example, in many languages, bots were used to create or maintain the articles for years (such as the articles about 1313 <https://www.wikidata.org/wiki/Q5735>, 1428 <https://www.wikidata.org/wiki/Q6315>, or 1697 <https://www.wikidata.org/wiki/Q7702>, each of which is available in more than a hundred languages). In English Wikipedia, many articles for US cities were created by a bot <https://en.wikipedia.org/wiki/List_of_Wikipedia_controversies#2002> based on the US census, and later updated after the 2010 census. Lsjbot <https://en.wikipedia.org/wiki/Lsjbot> by Sverker Johansson is a well known example of a bot that has created millions of articles about locations or species across a few languages such as Swedish, Waray Waray, or Cebuano. Comparable activities, although not as prolific, have been going on in quite a few other languages. How do these approaches work? Assume you have a dataset such as the following list of countries: Caption CountryCountryCapitalPopulation Jordan Asia Amman 10428241 Nicaragua Central America Managua 5142098 Kyrgyzstan Asia Bishkek 6201500 Laos Asia Vientiane 6858160 Lebanon Asia Beirut 6100075 Now we can create a model that can generate a complete text from this data, such as “*<Country>* is a country in *<Continent>* with a population of *<Population>*. The capital of *<Country>* is *<Capital>*.” With this text and the above dataset, we would have created the following five proto-articles (references not shown for simplicity): *Jordan* is a country in Asia with a population of 10,428,241. The capital of Jordan is Amman. *Nicaragua* is a country in Central America with a population of 5,142,098. The capital of Nicaragua is Managua. *Kyrgyzstan* is a country in Asia with a population of 6,201,500. The capital of Kyrgyzstan is Bishkek. *Laos* is a country in Asia with a population of 6,858,160. The capital of Laos is Vientiane. *Lebanon* is a country in Asia with a population of 6,100,075. The capital of Lebanon is Beirut. Classical textbooks on that topic such as *“Building natural language generation systems” <https://en.wikipedia.org/wiki/Special:BookSources/978-0-521-02451-8>* call this method *“mail merge”* (even though it is used for more than mail). A model is combined with a dataset, often from a spreadsheet or a database. This has been used for decades to create bulk mailings <https://en.wikipedia.org/wiki/Mail_merge> and other bulk content, and is a form of mass customisation <https://en.wikipedia.org/wiki/Mass_customization>. The methods have become increasingly complex over time and are able to answer more questions: How to deal with missing or optional information? How to adapt part of the text to the data, *e.g.* use plurals or grammatical gender or noun classes where appropriate, *etc.*? The bots that were mentioned above, which created millions of articles in various languages on Wikipedia, have mostly worked along these lines. For a great example of how far the model approach can be pushed, consider Magnus Manske’s Reasonator <https://meta.wikimedia.org/wiki/Reasonator>, which, based on the data in Wikidata, creates the following automatic description for Douglas Adams <https://reasonator.toolforge.org/?q=Q42>: *Douglas Adams* was a British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer. He was born on March 11, 1952 in Cambridge to Christopher Douglas Adams and Janet Adams. He studied at St John's College from 1971 until 1974 and Brentwood School from 1959 until 1970. His field of work included science fiction, comedy, satire, and science fiction. He was a member of Groucho Club and Footlights. He worked for The Digital Village from 1996 and for BBC. He married Jane Belson on November 25, 1991 (married until on May 11, 2001 ), Jane Belson on November 25, 1991 (married until on May 11, 2001 ), and Jane Belson on November 25, 1991 (married until on May 11, 2001 ). His children include Polly Adams, Polly Adams, and Polly Adams. He died of myocardial infarction on May 11, 2001 in Santa Barbara. He was buried at Highgate Cemetery. If we were to say that this is merely better than nothing, I think we would undersell the achievement of Reasonator. The above text, together with the appealing display of the structured data in Reasonator, leads to a more comprehensive access to knowledge than many of the individual language Wikipedias provide for Douglas Adams. For comparison, check out the articles in Azery <https://az.wikipedia.org/wiki/Duqlas_Adams>, Urdu <https://ur.wikipedia.org/wiki/%DA%88%DA%AF%D9%84%D8%B3_%D8%A7%DB%8C%DA%88%D…> , Malayalam <https://ml.wikipedia.org/wiki/%E0%B4%A1%E0%B4%97%E0%B5%8D%E0%B4%B2%E0%B4%B8…> , Korean <https://ko.wikipedia.org/wiki/%EB%8D%94%EA%B8%80%EB%9F%AC%EC%8A%A4_%EC%95%A…>, or Danish <https://da.wikipedia.org/wiki/Douglas_Adams>. At the same time, it shows errors that most contributors wouldn’t know how to fix (such as the repetition of the names of the children, or the spaces inside the brackets, *etc.*). The Article placeholder <https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder> project has partially fulfilled the role of filling content gaps, but the developers have intentionally shied away from the results looking too much like an article. They display structured data from Wikidata within the context of a language Wikipedia. For example, here is the generated page about *triceratops* in Haitian Creole <https://ht.wikipedia.org/wiki/Espesyal:AboutTopic/Q14384>. One large disadvantage of using bots to create articles in Wikipedia has been that this content was mostly controlled by a very small subset of the community — often a single person. Many of the bots and datasets have not been open sourced in a way that someone else could easily come in, make a change, and re-run the bot. (Reasonator avoids this issue, because the text is generated dynamically and is not incorporated into the actual Wikipedia article.) With Wikifunctions and Wikidata, we will be able to give control over all these steps to the wider community. Both the models and the data will be edited on wiki, with all the usual advantages of having a wiki: there is a clear history, everyone can edit through the Web, people can discuss, *etc.*. The data used to populate the models will be maintained in Wikidata, and the models themselves in Wikifunctions. This will allow us to collaborate on the texts, unleash the creativity of the community, spot and correct errors and edge cases together, and slowly extend the types of items and the coverage per type. In a follow-up essay, we will discuss a different approach to creating abstract content, where the content is not the result of a model based on the type of the described item, but rather a manually constructed article, built up sentence by sentence. *Development update from the week of May 27:* - The team had a session at Hackathon, which was well attended (about 30 people). Thanks to everyone for being there and your questions and comments! - We also had follow-up meetings with User:Mahir256, to improve alignment on the NLG stream - Below is the brief weekly summary highlighting the status of each workstream - Performance: - Observability document drafted. - Updated Helm charts for getting function-* services in staging. - Completed performance metrics design and shared for review - NLG: - Scoped out necessary changes to Wikifunctions post-launch - Metadata: - Started recording and passing up some function-evaluator timing metrics to the orchestrator - Experience: - WikiLambda (PHP) layer has been migrated to the new format of typed lists - Improved the mobile experience of the function view page - Transitioned the Tabs component to use Codex's, thanks to the Design Systems Team. - Design: Carried out end-to-end user flow testing in Bangla. *(Apologies for this update being late. We plan to send out another update this week)*

3 4

Newsletter #75: Recommendations for Wikifunctions
by Denny Vrandečić 10 Jun '22

10 Jun '22

The on-wiki version of this newsletter is here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-10 -- Design researcher Jeff Howard did another round of research <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-24> in order to prioritize issues in the run up to launch, and beyond. The full report of the user research has been published <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Design/Wikifunctions_usa…> on Meta. Aishwarya, the designer on the Abstract Wikipedia team, has read and analysed the research results, and summarized them in a slide deck <https://commons.wikimedia.org/wiki/File:Wikifunctions_usability_tests_2022_…>. This week, Aishwarya presented the deck to the team, and we are offering here a short summary of the presentation. The goal for designing the function page is two-fold: to be understandable and usable to technical people of all backgrounds, and welcoming to people with low levels of programming expertise. Technical contributors should understand the function creation workflow and the Wikifunctions mental model. Seven technical participants were interviewed using Aishwarya’s designs in Figma <https://www.figma.com/proto/05qjdoiV05MtZD2vEfbDDe/User-testing-function-fl…> (click anywhere on the screen to progress through the slides and remember to expand your window). The interviewees raised many great questions, validated a lot of our design work, and identified several areas for improvement. Overall, the report validated that we have met the stated design goals of the user interface being understandable and usable for technical people, but the report also highlighted that the contributors did not really understand the function creation workflow and the general Wikifunctions mental model. In short, they could get everything done, but were often confused about what they were doing and why it was presented in that way. I will not go into the many things that worked out well. You can read about them in the full report and also in the slides. I do want to call out the praise for the work summary diagram, which is consistent with many other reactions we also got in the chat and in other interactions with Wikimedia community members. I also want to use the chance to congratulate Aishwarya on her design work, and seeing it validated so positively. We are all very much looking forward to getting the implemented design out there for you to play with, and learning more about how we can improve it. Two points were called out by the interviewees in particular as causes of surprise or confusion: the split between function definitions and their implementations, and the multi-lingual nature of Wikifunctions. In Wikifunctions, we allow each function to have several implementations <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-06-17>. We achieve this by having implementations be their own pages on Wikifunctions. Such a separation is not a novel concept: programming languages such as C++ or Ada had header and implementation files for decades, and object oriented languages <https://en.wikipedia.org/wiki/Object-oriented_programming> have interfaces that can be implemented by different classes. But interviewees have repeatedly wanted to jump right into providing the implementation. They were confused that they could publish a function's definition even before having provided an implementation. This was also a request we have seen in previous user tests. As a side remark, the little word *“publish”* really did a lot of heavy lifting here. A long time ago, Wikipedia used to use the word *“save”* for the button that let the contributor store an edit, and this was changed to *“publish”* in 2017, based on user research that found wiki users surprised and alarmed that merely 'saving' an edit would put it online, in public, for everyone to see, forever. This user study reiterated the point that the word *“publish”* makes it clear that the contribution will indeed go live to the whole world. But at the same time, several interviewees felt that just a function definition, without any implementations yet, didn’t seem to be useful to be published. The word *“publish”* really brought out that contrast, and helped us identify this discrepancy in the user’s mental model. The second point that raised quite strong reactions was the multi-lingual nature of Wikifunctions. That is one of the points that is often questioned in the design of Wikifunctions, often unprompted: why does it have to be multi-lingual? Why labels in different languages? Doesn’t everyone who wants to code just learn English? To quote one of the interviewees, *“usually people who speak other languages are just expected to learn English to code”*. Because the world of coding is indeed so English-centered, it is very difficult to find people with coding experience who don’t speak basic English, and indeed all interviewed contributors spoke English. There have been a number of research studies showing that the English-centricity of programming is a major barrier <https://dl.acm.org/doi/abs/10.1145/3173574.3173970> for many people. People who can use their own language to code achieve results faster <https://dl.acm.org/doi/abs/10.1145/3051457.3051464>. For parents that don’t speak English, it is more difficult to help their children <https://dl.acm.org/doi/abs/10.1145/3173574.3174196> to learn programming. Based on these and other research results, we choose to intentionally deviate from the recommendations of our own user research, as we believe that this aligns better with the Wikimedia 2030 movement strategy recommendations <https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20/Recomme…>, particularly towards knowledge equity. There were many smaller, but very good points raised. The contributors asked for a space to describe the functions in more detail (that’s planned for Phase ι <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases#Phase_%CE%B9_(iot…>, which is up next in our development plan). The term *“aliases”* confused users. The list of types was too simple. The example table was identified as a place that probably won’t scale for complex entries. The difference between the words *“available”* and *“proposed”* and *“verified”* in the tables showing implementations and testers was confusing. And there were quite a few more. We also identified a number of larger areas that could be improved: making the use of language more consistent throughout, displaying more meta-data immediately, and improving the text to make the distinction between definitions and implementations clearer. We are going to work on these design challenges. We are relieved and pleased to see that the designs allowed all the contributors to fulfill their tasks. We are more than excited to implement these designs, and get them to you. We would love to hear from you, if you have ideas or suggestions around the issues discussed here, or in the full report. Thanks to all the contributors who were interviewed, thanks to Jeff for performing the research, and thanks to Aishwarya for summarizing the results. Updates as of June 3: Fix-it week - May 30 – June 3 was a ‘Fix-it’ week for the Abstract Wikipedia team. During this week, the team paused the development of new features and focused on tasks related to technical debt. - Design update: This week, the team kicked off the design work for the ‘lists’ component.

1 0

Newsletter #74: A proposal for the NLG architecture
by Denny Vrandečić 07 Jun '22

07 Jun '22

The on-wiki version is available here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-05-27 -- Our Google.org fellow, Ariel Gutman <https://meta.wikimedia.org/wiki/User:AGutman-WMF>, has recently authored a proposal of an architecture for the NLG system <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/NLG_system_architecture_…> of Abstract Wikipedia. The proposed architecture is driven by 4 main tenets: 1. *Modularity*: the system should be modular, in that various aspects of NLG (e.g. morphosyntactic and phonotactic rules) can be modified independently. 2. *Lexicality*: the system should be able to both fetch lexical data (separate from code), and rely on productive language rules to generate such data on the fly (e.g. inflecting English plurals with an -s). 3. *Recursivity*: due to the compositional and recursive nature of most languages, an effective NLG system would need to be recursive itself. 4. *Extensibility*: the system should be receptive to extension both by linguistic experts and technical contributors, as well as by non-technical and non-expert contributors, working on different parts of the system. These considerations lead to a proposal of a "pipeline" system, in which an input Constructor is being processed by different modules (corresponding to various aspects of natural language) until the final output text is rendered. [image: A proposal of an NLG architecture for Abstract Wikipedia.svg] <https://meta.wikimedia.org/wiki/File:A_proposal_of_an_NLG_architecture_for_…> In this pipeline dark blue forms are elements which would be created by contributors to Wikifunctions (rectangles) or Wikidata (rounded rectangles), while the light blue elements represent function or data living within the Wikifunctions orchestrator. A key aspect of the system are the "templatic renderers". Wikifunctions will provide a specialized *templating language*, developed in-house, which should enable even non-technical contributors to write renderers for their language. These renderers will be supported by lexical data from Wikidata and Universal Dependency-style grammatical relations, which would be defined within Wikifunctions by linguistically-interested contributors. We will be glad to hear any feedback from you on the proposal's talkpage <https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/NLG_system_architec…>, in particular about the idea to develop an in-house templating system. Further updates for last week: - This week, the team held its first Deep Dive session. We presented our project OKRs and received feedback from leadership - The team spent time this week preparing for last weekend's Hackathon: - There was a presentation and Q&A about Wikifunctions - A few Phabricator backlog tasks were identified and tagged for Hackathon participants Below is the brief weekly summary highlighting the status of each workstream: - Performance: - Made progress on Beta cluster setup: orchestrator and evaluator services now update automatically to the latest image - NLG: - Completed the initial draft of the NLG system architecture design document - Metadata: - Partially completed the front-end code to accommodate both forwards and backwards compatibility for the old & new metadata formats - Experience: - Made more progress for function view and editor implementations for mobile - Completed function-schemata migration to Benjamin arrays - Handed off designs for 'Text with fallback'

2 2

2024

2023

2022

2021

2020

Abstract-Wikipedia June 2022