The on-wiki version of this newsletter edition can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-12-12 -- Sketching a path to Abstract Wikipedia
The main goal of Wikifunctions is to support Abstract Wikipedia: a source of multi-lingual Wikipedia content where we can create and maintain the content only once, but have it available across many different languages to fill some of the gaps that currently exist in some Wikipedias.
Today, I would like to sketch out how the natural language generation for Abstract Wikipedia might develop. As an example goal, let’s take the following sentence (based on the English Wikipedia article about Waakye https://en.wikipedia.org/wiki/Waakye):
English: *“Waakye is a Ghanaian dish of cooked rice and beans.”* French: *“Le waakye est un mets ghanéen de riz et de haricots cuits.”* German: *“Waakye ist ein ghanaisches Gericht aus gekochten Reis und Bohnen.”*
We look at four stages to work towards this text. Stage 1: String-based substitution
In Stage 1, we use simple string substitution, in the style of Mad Libs https://en.wikipedia.org/wiki/Mad_Libs. This approach requires the user to carefully select the right strings, which is quite simple in English, but gets more complicated in French or German.
So we could have the following function calls:
Instance with origin string-based English(“Waakye”, “dish”, “Ghanaian”) → *“Waakye is a Ghanaian dish.”*
Instance with origin string-based French(“Le waakye”, “un mets”, “ghanéen”) → *“Le waakye est un mets ghanéen.”*
Instance with origin string-based German(“Waakye”, “ein Gericht”, “ghanaisches”) → *“Waakye ist ein ghanaisches Gericht.”*
This is possible right now. It requires quite detailed grammatical knowledge by the function caller, as they need to enter the right form manually. The benefit of this method is difficult to see in this example. Stage 2: Lexeme-based generation
In Stage 2, instead of using strings, we use Wikidata Lexemes, possible in the past few months. This allows for a version of the function where the function caller does not have to worry about agreement and entering the right form manually, but the function implementer needs to select the right form from the Lexeme instead. This shifts some of the burden from the function user to the function author.
This makes the calling much simpler: we don’t have to know whether *“waakye”* in French will be *“Le waakye”* or *“La waakye”*, we don’t have to select the agreeing adjective in German (*“ghanaisches Gericht”* or *“ghanaischer Gericht”*), etc. The correct form will be chosen by the Function.
Now we would have the following function calls:
Instance with origin Lexeme-based English(Lxxx/Waakye, L3964/dish, Lxxx/Ghanaian) → *“Waakye is a Ghanaian dish.”*
Zxxx/Instance with origin Lexeme-based French(Lxxx/waakye, L24812/mets, Lxxx/ghanéen) → *“Le waakye est un mets ghanéen.”*
Zxxx/Instance with origin Lexeme-based German(Lxxx/Waakye, L500931/Gericht, Lxxx/ghanaisch) → *“Waakye ist ein ghanaisches Gericht.”*
You also will find that a lot of Lexemes are missing for this particular example, such as the French Lexeme for something from Ghana. We in the Wikimedia movement need to think about how to approach this gap in what is – and should be – in Wikidata's Lexemes.
We were hoping that this would be possible right now, and we created a number of functions during our offsite to test these capabilities. Unfortunately, we learned that the system is currently failing to evaluate most such function calls, and accordingly we decided to put a big focus in the upcoming Quarter on getting these functions to run. Stage 3: Item-based generation
In the third stage, we would use Wikidata items to help us select Lexemes from a given language that have comparable meanings. The function caller does not have to know or look up the right Lexeme in all the languages they want to generate the text in. They can just put in the relevant Wikidata items, and the function developer can implement the relevant lookups.
This means that whether or not the function caller knows that the concept *“dish”* is called *“mets”* in French or *“Gericht”* in German, they will still be able to create perfectly fluid and correct sentences in those languages.
This allows us to make the following calls (note that all three calls use *the same function* here, and the caller does not have to know the languages at all):
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana, Z1002/English) → *“Waakye is a Ghanaian dish.”*
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana, Z1004/French) → *“Le waakye est un mets ghanéen.”*
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana, Z1002/German) → *“Waakye ist ein ghanaisches Gericht.”*
Note that the function will in most cases just route to the language specific functions developed for the previous stage, but that happens behind the scenes and transparently for the function caller.
This is currently not possible to implement on Wikifunctions — we still need to add a function that allows us to find the Lexemes connected to a given Item. We will work on that in the coming Quarter, and are thankful to the Search and Wikidata teams for the necessary pre-work they have recently performed to unlock the possibility. Stage 4: Item-based content
The final stage we want to discuss today is based on using the knowledge in Wikidata to create text. We can pull from Wikidata that Q14783691/Waakye https://www.wikidata.org/wiki/Q14783691 is a dish from Q117/Ghana https://www.wikidata.org/wiki/Q117, we can look up the ingredients and their Lexemes, etc. Given the current knowledge about Waakye in Wikidata, this could then generate the following sentences:
Food with origin and ingredients(Q14783691/Waakye, Z1002/English) → *“Waakye is a Ghanaian dish with bean, rice, water, and salt.”*
Food with origin and ingredients(Q14783691/Waakye, Z1002/French) → *“Le waakye est un plat ghanéen composé de haricots, de riz, d'eau et de sel.”*
Food with origin and ingredients(Q14783691/Waakye, Z1002/French) → *“Waakye ist ein ghanaisches Gericht aus Bohnen, Reis, Wasser und Salz.”*
This further simplifies writing the function calls: all we need to select is the dish and the language, and we get a whole sentence that can, in many cases, make a good opening sentence for the Wikipedia article about the given dish, or as an entry or short description in various places.
I hope that this gives a good overview of our next few planned steps with regards to natural language generation, and how Wikifunctions can support bringing together our different language communities. Team offsite in Lisbon https://www.wikifunctions.org/wiki/File:Abstract_Wikipedia_team_Lisbon_2024.jpgAbstract Wikipedia team at the offsite in Lisbon 2024. From left to right, front row: Cory Massaro, Grace Choi, Genoveva Galarza Heredero, Daphne Smit. Back row: James Forrester, Denny Vrandečić, David Martin, Sharvani Haran. Not in picture: Amy Tsay, Amin Al Hazwani, Luca Martinelli, Elena Tonkovidova, Vaughn Walters.
Last week, the team met for its annual meeting in Lisbon, Portugal. What a beautiful city! We enjoyed walking through the city, and had very productive meetings, discussing our plans, team procedures, and using the time for bonding and social cohesion – very difficult and important to achieve in a team that is fully remote.
The most tangible outcome is the planning for the next Quarter; we had very lively discussions to find a consensus, which we still need to write up. We will report on the plan in one of the next two updates.
New tool for querying Wikifunctions
User:Feeglgeef https://www.wikifunctions.org/wiki/User:Feeglgeef created a new tool that allows you to query Wikifunctions in a very flexible way. You can search for functions with implementations in Python, types that use numbers on keys, functions that take three arguments, or return booleans. The tool is available on Replit (note that this is outside of Wikimedia servers), and examples and a documentation of the query language are linked from the front page of the tool: wf-query.replit.app
User:Hogü-456 https://www.wikifunctions.org/wiki/User:Hog%C3%BC-456 created an overview of existing tools. If you are aware of more tools, feel free to add them: Wikifunctions:Tools https://www.wikifunctions.org/wiki/Wikifunctions:Tools Recent Changes in the software
There's no release of MediaWiki software this week due to the End-of-Year release freeze, so nothing new to update. As always, please alert us if you run into any issues. News in Types: Gregorian calendar date, Byte, Unicode code point
We finally have a Type for Gregorian calendar dates https://www.wikifunctions.org/view/en/Z20420. We have been working a while towards it, having created a Type for the relevant months https://www.wikifunctions.org/view/en/Z16098, for years https://www.wikifunctions.org/view/en/Z20159, *etc.* The discussion https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_calendar_date was lengthy and didn’t lead to a full consensus. A rationale for the decisions https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_calendar_date#Decision on the design of the Type is provided. We invite you to create functions using the Type!
This has been by far the most complex Type we are providing so far.
We would like to create Types for other, non-Gregorian calendars, like the Chinese, Ethiopian, Japanese, Hebrew, and other calendars. If you know any of these calendars well, please reach out so that we can create the respective calendars.
In other type related work, proposals for fixing the Byte https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Byte type and the Unicode code point https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Unicode_codepoint type (previously character type) have been made. Input is and discussions are very welcome. Recordings of December’s Volunteers’ Corner
We had a Volunteers’ Corner this Monday, December 9. It was lively with many good questions. A recording of the Corner is available on Commons https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner_2024-12.webm .
The function we built together is featured below as the Function of the Week. Recording of Denny’s SWIB24 keynote
Denny Vrandečić gave a keynote address at the Semantic Web in Libraries 2024 conference. The topic was on the role of knowledge representations in a world of large language models. The recording is available on YouTube https://www.youtube.com/watch?v=NmCbTOZ4Yos. Function of the Week: how many days between two days in the Roman year
The last newsletter introduced the days of the Roman year https://www.wikifunctions.org/view/en/Z20342 as a new Type. As of now, we have 18 new functions using the Type. Also, this week’s Volunteers’ Corner created such a function, so we will take a look at the resulting function.
How many days are there between two days? Function Z20733 https://www.wikifunctions.org/view/en/Z20733 can answer that question. The function has three arguments: the two days https://www.wikifunctions.org/view/en/Z20342, and a Boolean https://www.wikifunctions.org/view/en/Z40 which tells us whether the days are in a leap year or not. It returns a natural number stating how many days are between the two given days.
It might be easiest to clarify what the function does by looking at the tests:
- From 1 January to 15 January https://www.wikifunctions.org/view/en/Z20735, that’s 14 days - From 1 January to 31 December https://www.wikifunctions.org/view/en/Z20737, that’s 364 days in a common year - From 28 February to 1 March https://www.wikifunctions.org/view/en/Z20736, it’s one day in a common year - But two days https://www.wikifunctions.org/view/en/Z20734 in a leap year
The tests are incomplete, with the most notable omission being for any tests where the first day is after the second, and what that exactly means with regards to understanding the leap year.
Currently, there is only one implementation for this function so far, which is partly due to the fact that we didn’t have much time left in the Volunteers’ Corner, and so we only did one in composition, because we found that the easiest way to implement the function.
The core of the composition is to turn both days into https://www.wikifunctions.org/view/en/Z20357 a number, counting which day of the year it is (i.e. 1 January is the first day, 2 January the second, 1 February the 32nd, etc.), and then subtract https://www.wikifunctions.org/view/en/Z17315 the first number from the second. The result is then turned from an integer to a natural number https://www.wikifunctions.org/view/en/Z17144, in order to avoid negative numbers.
abstract-wikipedia@lists.wikimedia.org