Abstract-Wikipedia September 2022

abstract-wikipedia@lists.wikimedia.org

3 participants
3 discussions

Newsletter 86: Staff contributions to Wikifunctions - CALL TO ACTION

by Denny Vrandečić

The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-09-27 -- Right from the start, Wikifunctions will be somewhat more complex than many other Wikimedia projects. Sure, over time many of the Wikimedia projects have accrued a lot of complexity: just think about Lua modules <https://www.mediawiki.org/wiki/Lua>, conditional templates <https://www.mediawiki.org/wiki/Template:If>, or the sophisticated use of MediaWiki features to support the workflows of the community. These complexities, however, have grown in scale alongside their wikis' communities, and all of the projects have started out with a very simple workflow. Wikifunctions will not. Though we are trying our best to make the project as accessible and usable as possible, we also want to help the community as best as possible to get the project off the ground. In order to do so, we at the Wikimedia Foundation want to do something that we usually don’t: directly support the community by contributing on-wiki to the main namespace of Wikifunctions in our capacity as staff, from our staff accounts. Usually, edits to the main namespace for staff accounts are limited to exceptional circumstances. This is primarily to make it very clear that the content of each of the projects belongs to its community. This is particularly important for projects like Wikipedia, where sometimes subtle changes in wording can be very important and have significant real-world consequences, as we were just reminded again recently <https://en.wikipedia.org/wiki/Talk:Recession>. Wikifunctions is different. Its content will be functions, their implementations, and other supporting objects. We would like to be able to work together with the community, in our paid time as staff members. This means working on functions, helping to improve implementations, showing exemplary cases of how to use new features, and also speeding up the creation of implementations for functions requested by the community. One particular domain where we are planning to contribute is for functions around natural language generation. I think that, without staff support, the necessary functions to make Abstract Wikipedia possible might take a long time to develop, and that support by staff can speed up that key area considerably. Despite this approach, we also want to make sure that Wikifunctions remains under the full ownership of the community. Whereas in the beginning our staff accounts might have certain special rights on Wikifunctions (e.g. the right to create instances of certain types), we want these roles to be transferred to the community sooner rather than later. We don’t want to be the ones making policy decisions beyond what is technically necessary (e.g. for platform performance or code-security reasons). We don’t want to be assigning sysop rights or other community leadership positions. We don’t want to make policy decisions about which functions, implementations, or which test cases are deemed necessary, valuable, or acceptable. All of these areas, and more, should be fully owned by the Wikifunctions community. It seems prudent and necessary, in order to be transparent, that the community drafts a policy together with us in order to define how we will be editing the Wikifunctions main namespace as staff. Since we will need this policy to be in place from the beginning of Wikifunctions, we are proposing to go the unusual path of creating a preliminary policy here on Meta with interested community members, which we will then transfer to Wikifunctions upon launch. That policy should be revisited once the Wikifunctions community has formed, and once we have hands-on experience with such edits. *Request*: We are calling for contributors to lead the creation of this preliminary policy, and asking everyone to comment and contribute to the policy. If no contributors step forward, the Abstract Wikipedia team will take the lead on drafting the preliminary policy. The policy will be drafted at Abstract Wikipedia/Staff editing <https://meta.wikimedia.org/w/index.php?title=Abstract_Wikipedia/Staff_editi…> and discussed at Talk:Abstract Wikipedia/Staff editing <https://meta.wikimedia.org/w/index.php?title=Talk:Abstract_Wikipedia/Staff_…> . There are many questions to be answered: what limitations should staff accounts face, if any? What about staff who are also volunteers? Should staff also apply for sysop rights and other roles, or should they automatically have certain rights and thus also responsibilities? How should staff engage in debates, if at all? These are difficult questions that would benefit from a preliminary answer, given to staff by the community. Note that all of this is strictly regarding Wikifunctions, and does not have any implications for the other Wikimedia projects. We are looking forward to working together!

1 year, 6 months

Newsletter 87: Cory in residency in İstanbul - closing parenthesis

by Denny Vrandečić

The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-09-30 -- As you might recall, the Abstract Wikipedia team's Cory Massaro <https://meta.wikimedia.org/wiki/User:CMassaro_(WMF)> recently finished an arts residency in İstanbul <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-07-12>, which he attended as part of the creative duo *Tecnologías Silvestres*. He will share here in his voice some highlights of the trip as well as some conclusions about knowledge democratization and the technological challenges facing specific language communities. Photo Tour <https://meta.wikimedia.org/wiki/File:Cat_in_istanbul.jpg> <https://meta.wikimedia.org/wiki/File:Cat_in_istanbul.jpg> Cat in İstanbul Istanbul is mostly cats, like 90%. Sometimes they stand under the streetlights and meow at you like catnip dealers. Sometimes they just judge you from the rocks <https://commons.wikimedia.org/wiki/File:Cat_in_istanbul.jpg>. <https://meta.wikimedia.org/wiki/File:Cathedral_underground_city.jpg> <https://meta.wikimedia.org/wiki/File:Cathedral_underground_city.jpg> Rock cathedral in Cappadocia We took a field trip to Cappadocia for a few days to see what that was about. The thing to do there, because it is warm as the devil's dust jacuzzi, has historically been to live in a lava dome or cave. There are some underground cities in Cappadocia where people used to hang out when it was hot or a war outside. One such city contains an ancient Rock Cathedral <https://commons.wikimedia.org/wiki/File:Cathedral_underground_city.jpg> (i.e., a cathedral made of a rock, not the site of the greatest Van Halen concert). <https://meta.wikimedia.org/wiki/File:Underground_power.jpg> <https://meta.wikimedia.org/wiki/File:Underground_power.jpg> Underground power cables The underground cities have lights hooked up for us pampered moderns. The caves are full of cables and electric boxes <https://commons.wikimedia.org/wiki/File:Underground_power.jpg>, creating a climate apocalypse vibe which is delicious. <https://meta.wikimedia.org/wiki/File:Shahmaran.jpg> <https://meta.wikimedia.org/wiki/File:Shahmaran.jpg> Shamaran There were sculptures like this all over the city <https://commons.wikimedia.org/wiki/File:Shahmaran.jpg>. I was embarrassed that I couldn't identify this twice-crowned snake-butt centilady, so I put the mythological expertise of the Abstract Wikipedia team to the test. Quiddity finally identified her as the Shahmaran <https://en.wikipedia.org/wiki/Shahmaran>. <https://meta.wikimedia.org/wiki/File:Istanbul_Museum_of_the_History_of_Scie…> <https://meta.wikimedia.org/wiki/File:Istanbul_Museum_of_the_History_of_Scie…> Water clock There's a whole Museum of the History of Science and Technology in Islam <https://en.wikipedia.org/wiki/Istanbul_Museum_of_the_History_of_Science_and…>. The museum begins with three galleries containing photos of European Christian men. After that, it gets really fascinating. One highlight was this gorgeous water clock <https://commons.wikimedia.org/wiki/File:Istanbul_Museum_of_the_History_of_S…> ! Art I spent a lot of time staring pensively into the middle distance in a scribal reverie. I made important literary sketches on cats fighting with seagulls, the behavior of people in coffee shops, and other snippets of daily life. Poems were written, short stories edited, and multiple visual art installations created with other residents at the space. I also gave two writing workshops using natural language processing and Surrealist techniques to generate ideas, which we then used to create poetry and songs (I made a word2vec <https://en.wikipedia.org/wiki/Word2vec> oracle!). Language and Technology and Hegemony and Abstract Wikipedia What kind of knowledge do people want to share? Many of us (or at least I) intuitively believe that certain knowledge is more-or-less "objective" and "neutral," but those categories are inadequate. Let us consider, for example, standard objective facts about geography and biology. A city has a certain population and square mileage, a founding date, a governing body (usually), landmarks. A city also has history, and in many places, that history cannot be discussed without reference to geopolitics. As I shared information about personal history with people at the residency, I learned facts about where they came from. Some of them came from cities about which an interesting, useful, and very sad fact concerned recent violence. Other facts had to do with the global superpowers which encouraged, condoned, supplied arms for, or directly perpetrated that violence. There are plants, like particular varieties of fig tree, which are now threatened or endangered due to how war has terraformed their environment. These are real, unimpeachable facts about cities and organisms, but it is impossible to state those facts plainly without making a political statement. While the propositional truth value of such a fact cannot be denied, subjective domains like a person's political values inform whether that fact is included in particular discourses. This is the art of creating narrative or stories. I would consider it a noble goal to make Abstract Wikipedia a platform where stories, not just facts, can be expressed and shared. Abstract Wikipedia is the right platform for this because it allows those stories to be shared outside the linguistic communities to which they are directly relevant. Just as Abstract Wikipedia is intended to convey objective information in less-resourced languages, I also hope that speakers of these languages will represent their knowledge in Wikidata so that Abstract Wikipedia can complicate the narratives of highly-resourced languages' Wikipedias. I also talked with people about how language informs their interactions with technology. Some of the observations were unsurprising (but still important to hear and hear again): certain software is hard to use in one language or another; the Internet opens up if someone speaks a hegemonic language, etc. One thing I hadn't anticipated was how often the discussions turned to literacy. It was fascinating to speak with people who were fluent and literate in multiple hegemonic languages but didn't read, or didn't read well, the language they spoke at home. A speaker of Kurmanji <https://en.wikipedia.org/wiki/Kurmanji> (Kurdish dialect) mentioned that, when he exchanged messages with his Kurdish-speaking friends, they used voice messages–using text felt unnatural. Abstract Wikipedia has been conceived primarily as a text-based project. This makes technical sense. However, if literacy is an impediment that affects how and in what language a person might choose to access a website, then it can be compared with other accessibility concerns. Vision-impaired persons likewise suffer when projects only consider the text interface. In both cases, it seems the same tools–screen reader-friendly User Interfaces, better Text-To-Speech technology in all languages–can help solve the problem. In summary, I left the residency with two big questions about the work our team is doing. First: how can Abstract Wikipedia serve challenging, controversial information, and expose people to perspectives they might not otherwise have access to? Second: issues of literacy and accessibility intersect in the languages Abstract Wikipedia wants to serve. What discussions can we have about that intersection?

1 year, 6 months

Newsletter 85: The State of Abstract Wikipedia Natural Language Generation

by Denny Vrandečić

(apologies for being quiet the last few weeks, we will catch up) This update went to the Diff blog. You can find the version on the Web here: https://diff.wikimedia.org/2022/09/21/the-state-of-abstract-wikipedia-natur… Here is a copy of the text for your convenience and for the archive. The State of Abstract Wikipedia Natural Language Generation 21 September 2022 by Natural Language Generation workstream of Abstract Wikipedia <https://diff.wikimedia.org/author/natural-language-generation-workstream-of…> The Abstract Wikipedia team has taken further steps toward representing abstract content in natural languages! When Denny introduced the proposal for Abstract Wikipedia here on Diff <https://diff.wikimedia.org/2020/05/07/a-proposal-for-a-new-wikimedia-projec…>, he noted the need for “functions that can translate the content of Abstract Wikipedia into the natural language text of every Wikipedia.” Those “functions” will eventually comprise a community-driven natural language generation <https://en.wikipedia.org/wiki/Natural_language_generation> pipeline. Research and prototyping for that NLG pipeline have now begun. In this post, we will outline how the architecture of the NLG templating system (part of the NLG pipeline) fits in with other components. We’ll also highlight open questions in the hopes of encouraging discussion and further contribution by the community. As the AW team discussed a few weeks ago <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-08-19>, the planned NLG realizer, also called Renderer, and a component of the NLG system, will use a template language to help write templates and then it will transform templates into natural language text. The template language will provide a high-level, readable, declarative syntax to steer text generation from the abstract content (captured with the constructors). Then, the template language parser will produce a series of function compositions, whose details are further described in Google.org Fellow Ariel Gutman and Professor Maria Keet’s template language specification <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wi…>. It’s important for us to begin creating some standards for these functions now in order to limit complexity and ensure interoperability, so that abstract content can indeed benefit all languages and so that the community can write Constructors and Renderers on Wikifunctions <https://meta.wikimedia.org/wiki/Abstract_Wikipedia#Project> with relative ease. Some of the complexities regarding doing NLG in agglutinating African languages have been addressed by Maria Keet in a TechTalk <https://commons.wikimedia.org/wiki/File:Knowledge-to-text_Natural_Language_…> she gave to Google fellows in a meeting they held in Zurich in August. To have a better idea of how the NLG realizer’s implementation may look, Ariel Gutman has started creating a Scribunto prototype <https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia>, which will inform the Wikifunctions implementation. Mahir Moshed has also created the Ninai <https://gitlab.com/mahir256/ninai/> and Udiron <https://gitlab.com/mahir256/udiron/> libraries in Python to prototype the realizer. We will share more about the prototype in a future Diff post. At the same time, Google.org Fellow Sandy Woodruff has started reflecting about a dedicated UI for the NLG system. You can learn about some of her ideas in a brainstorming session <https://commons.wikimedia.org/wiki/File:Natural_Language_Generation_in_Wiki…> held at the aforementioned meeting. One open question concerns the Constructors themselves. A Constructor represents a piece of abstract content. Let’s adapt an example from the template language specification: Age( entity: Malala Yousafzai (Q32732) age_in_years: 25 ) This is a Constructor that represents a fact true at the time of writing, namely the age of Malala Yousafzai, which would be rendered in English as “Malala Yousafzai is 25 years old.” Note that, in reality, “age_in_years” would itself likely be defined by a function call that calculates age based on birth date and the present date, but this detail is omitted here for clarity. Many of our open questions concern how representative this example Constructor is. This example represents a single proposition and can be realized as a sentence in most (maybe all?) natural languages, but will that be true of all Constructors? What if some Constructors embed multiple propositions? Is it possible for a Constructor to correspond to an incomplete proposition? Another set of questions concerns how general the relationship between a Constructor and its participant entities should be. We might imagine a Constructor for the sentence, “Bi Sheng invented movable type in 1040 AD.” In order to make Constructors reusable across languages and for multiple propositions, we would want to enshrine more general scenes or frames like “Age” above or, in this case, “Invent.” What, if any, linguistic formalization should be adopted for this purpose? FrameNet <https://framenet.icsi.berkeley.edu/fndrupal/> is one possibility, but might another work better, or does Abstract Wikipedia demand an *ad hoc* solution? How do we handle information which belongs in a sentence but isn’t intrinsically part of a proposition, e.g. “in 1040 AD” from the given example, which isn’t a “core” part of the notion of inventing something the way that the inventor and invention are? Kutz Arrieta from Google has begun thinking about these questions <https://docs.google.com/document/d/1CDqpNgynN34qcRBi__KwxdeKLvNm6EQq/edit?p…> . Once the Constructors have done their job, the Renderers’ work begins. The working NLG proposal presumes that the lexical forms in Wikidata will be marked with grammatical features (*e.g.*, number for nouns and verbs, gender or class for substantives, aspect and tense and mood for verbs, …). Mahir Morshed and the rest of the NLG contributors have begun work on standardizing these representations in Wikidata’s lexicographical content <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation/L…>, but our NLG system can’t assume the data will always be present or complete. Therefore, our questions here concern how to address missing lexical data. When the system generates a sentence, can it provide multiple possibilities for words it’s uncertain about? Should it allow the user to add new terms at that time? If so, how would it guide them to contribute to Wikidata from another project’s context? These are big questions, but hopefully the challenges they present look exciting, rather than intimidating. As always, we welcome your contributions <https://meta.wikimedia.org/wiki/Abstract_Wikipedia#Participate>. We hope that the breadth of experience and sheer number of languages present within the community will help us find the most equitable solutions possible.

1 year, 6 months

2024

2023

2022

2021

2020

Abstract-Wikipedia September 2022