We have written a proposal for types to represent a part of speech in a language, e.g. a type for English nouns, a type for English adjectives, for Polish nouns, etc.
Even though many of these parts of speech in different languages are quite different, they have certain commonalities. We are proposing a pattern for these types to follow using a “table” type, and a function called “merge” that can help with many tasks of agreements in many different languages.
Based on observing previous natural language generation systems, such as Grammatical Framework, we have extracted this proposal. This proposal doesn’t yet touch on the big question of how to represent abstract content, but only on the question of how to more conveniently grow sentences from lexicographic entries in Wikidata. Over the last week months we have seen quite a few functions being built to construct phrases and sentences. We saw a few challenges, and are trying to address these with the given proposal.
We also discussed the type in the NLG SIG meeting (for a recording, see below), and the discussions on-wiki have already started.
This Tuesday, we had our first public Natural Language Generation Special Interest Group meeting. A recording of the meeting is available on Wikimedia Commons. A wiki page for the NLG SIG has been created, and agenda items for the meeting next month can already be collected there.
A number of bigger changes landed this week, though not all of them are available immediately.
As part of our work this Quarter on extending Wikidata use, we're adding the new pre-defined function Z6830 and its built-in implementation Z6930 to get the Lexemes related to an Item (T383631). We're also extending the definition of Wikidata property (Z6002) to allow more comprehensive fetching of Wikidata properties (T383636), and provided new front-end UX components for referencing, fetching, and displaying them (T383643).
On the integrating-Wikifunctions-calls-into-Wikipedias front, we have landed the major, base functionality on the PHP side (T272516). This is behind a feature flag, so this is not yet available in production to test. It so far includes triggering the relevant display function on the response so it can be displayed in the content (T362252), the relevant reading functions on the inputs so they can be turned from text into Objects (T368604), caching each call result in a cluster-wide system to improve speed and reduce load (T362256), and the beginning of the cross-wiki notification code that will make sure the content is updated when the Function or Wikidata content changes, and put entries into RecentChanges and Watchlist pages on client wikis, like Wikidata does (T383156). We're also close to completion of the base integration with the wikitext and visual editors (T373118), which will allow us to experiment with embedding the full Wikifunctions experience for users. The PHP version of the error system can now be asked to output messages in an arbitrary language, rather than just English (T362236), which will be used to display errors if things go wrong on the reader's page.
We fixed a few minor user-facing issues this week. We've tweaked the 'languages' dialog triggered from the About box to sort the languages based on the user's locale language (T355951). We fixed some bugs spotted in the use of the new Codex component for the table listing the Implementations and Test cases for Functions – thanks for the help from the community for spotting some of those!
For development purposes, we've added a new staff-only right to bypass the results caching (T379432). We've re-written our code's MediaWiki registration, so that it now specifies exactly which Codex components we use, so we get a custom build rather than the whole library (T372799). This reduces the bytes shipped to each user of Wikifunctions.org, which we hope might improve performance when using the site. We've re-built the front-end codebase to use the Pinia state management library instead of Vuex, which has been deprecated (T318630), and consolidated and renamed files and require() statements to match. We've re-organised the test code to start splitting out the tests that are for 'repo mode', i.e. for Wikifunctions.org, and that for 'client mode', such as Wikipedias.
Last week, GitHub invited Danny Thompson, the CTO of this.dot to a conversation in their Open Source and AI series. He gave a great explanation of what Abstract Wikipedia and Wikifunctions is about, and we recommend a the video for a watch, The video is available on YouTube and on Twitch:
On Monday, 10 March 2025, Denny Vrandečić will talk at King’s College London on the topic of Knowledge in the Age of AI. Wikidata and Wikifunctions will be topics in that talk. The event will be hybrid. You can join either remotely or locally in London. Free registration via Eventbrite is requested if you want to attend. The talk will be recorded.
The Function of the Week is a column written by the community. This week's submission has been written by 99of9 and edited and improved by Feeglgeef and GrounderUK. Planning the column and submissions can be made here.
This week we are discussing: string of numeral digits in order from language.
The function takes one input, a Natural language, and returns a string of length 10 with each of the digits of numerals in that language. So, for example, languages using western Arabic numerals have digits "0123456789". On the face of it, this seems like a strange function to write or use. If you already speak a language, you already know these digits, and if you don't know a language, it tells you just one piece of information in a hardcoded inflexible string. But for a multilingual project like ours, it turns out to be such a useful helper function that we are currently using it indirectly every time we use the read functions for a natural number or floating point number in Wikifunctions.
Only one implementation is available so far, a Python lookup. It essentially just checks if the input is a language with a known answer and, if so, directly returns the string for that language, defaulting otherwise to "0123456789". This implementation can never be fully completed, but even in its current form, it is very useful. It can be improved by adding additional decimal scripts, or by applying an existing string to another dialect that shares one of the existing scripts. There are about 20 more scripts listed at Hindu-Arabic numeral system, so don't hesitate to add one. A JavaScript implementation would also be easy to implement here, and may deliver a small efficiency gain, which given how often this function is called, could be useful.
16 tests are available. Isn't linguistic diversity beautiful? Can you spot any patterns?
These tests simply repeat all of the strings represented in the Python implementation so far. So they are not particularly rigorous tests of the cases the function does not cover, but are more to display its current capabilities. Feel free to add tests of other languages or dialects that you know.
What is the point of this function? In scripts that use a decimal numeral system and positional notation, the structure of numbers is so well defined that, even if you can't read the language, knowing the decimal digits is all you need to infer what number a numeral represents. This is used to our advantage in three compositions that call this function: a generic natural number reader, a generic float64 reader, and a configurable float64 reader, which uses it to pre-process digits before a language-specific configuration function is used. Although we only have one script-specific number read function, read Malayam natural numbers leniently, these general compositions offer an alternative, broader generic approach, allowing us to read numbers in a much wider range of scripts (see the figure for an example adding natural numbers with a Bangla interface language). Although it's not implemented yet, we will also be able to use this function in similar display function compositions to show numeral outputs in these scripts.
Please consider adding a single extra line of code to this function to support an entire additional numeral system in Wikifunctions!
Here is a list of new functions that have been created since last week, with connected implementations and passing tests. Plenty of new functions to celebrate!
We can see a rich variety of functions – and there were a few more that didn’t make it to the list because they didn’t have tests or implementations. A comprehensive list of all functions sorted by creation date can be found on-wiki.