An interesting case for "there are 4 Apples" like sentences is Japanese.
There is no real plural form of "Apple" (or other nouns) in Japanese, but how the "4", as an amount, is expressed, depends on the kind of stuff it represents, see Japanese counter word https://en.wikipedia.org/wiki/Japanese_counter_word on Wikipedia. The counter for "apple" is not the same as the counter for "book". Also (less relevant for Wikilambda maybe), the number pronunciation may depend on the counter, a pair "number / counter" may be considered as a lexeme …
Japanese is kind of an extreme form where most usual grammatical features of Western like languages are useless : gender mostly does not count, words are not gendered, almost no plural form (verbs, adjective, nouns …) as far as I can tell with my limited knowledge, plurals are mostly a feature of pronouns https://en.wikipedia.org/wiki/Japanese_pronouns. Verbs forms depend on the tense and the language formality, mostly, and nothing else, especially not on the gender and the number.
One other interesting feature of Japanese are stuff like the "topic" of a sentence : https://en.wikipedia.org/wiki/Topic_marker which is distinct of the "subject", which may be a puzzle for people used to languages where this does not exists. This will be interesting to understand how Abstract language will need (or not) to represent these features as a high level language or if they will appear later in the translation phase.
Le jeu. 30 mai 2024 à 21:27, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-05-30 -- A single singular or a plurality of plurals? https://www.wikifunctions.org/wiki/File:Latin_dictionary.jpg18th century Latin dictionary
We are working towards functions being able to access data from Wikidata. The first use-case we are aiming for is to access the lexicographic Forms of a Lexeme, given a Lexeme ID. For example, consider a function that creates sentences such as *“There are four apples.”*, where both the number *“four”* as well as the noun *“apple”* are arguments to the function that creates the sentence.
What should the Functions for accessing Lexeme Forms look like? If you have ideas to sketch that out, please go ahead, unbound by technical limitations. We’ll look forward to seeing your ideas and using them as inspiration in order to mold what we can achieve technically on top of the platform we have.
In the above example, we could have a Function that creates *“four”* based on the natural number 4, and *“apples”* based on the Lexeme ID L3257 https://www.wikidata.org/wiki/Lexeme:L3257. The Lexeme has two forms: L3257-F1 being *“apple”*, marked with the grammatical feature singular https://www.wikidata.org/wiki/Q110786, and L3257-F2 being *“apples”*, marked with the grammatical feature plural https://www.wikidata.org/wiki/Q146786. In order to get the right Form, we can either look up the relevant Form Id manually, which will hardly scale, or we use the grammatical features to request the right form. In other words, there could be a Function e.g. "return form" which takes a Lexeme ID and a list of grammatical features and returns all matching Forms. return form(L3257, [plural])
would return *“apples”*. Accordingly, for the Estonian verb *“amüseerima”* (to amuse), we would make the call return form(L350582, [third person, plural, present tense, indicative])
to return *“amüseerivad”*.
One question we will have is whether the English plural and the Estonian plural should be the same object in Wikifunctions, or whether they should be two different Objects. In Wikidata, the answer is that they are (in general) the same Item, plural https://www.wikidata.org/wiki/Q146786 – the form for more than one, or zero, depending on the language. There are several languages which have other grammatical numbers, such as *paucal*, *dual*, *trial*, and others, which are not used in English and other languages that only have the two. Even for languages that use only two values, there are differences; for instance, English uses the plural form for zero ('*He ate zero eggs*'), whereas French uses the singular ('*Il a mangé zero œufs*', or more idiomatically '*Il n'a mangé aucun œuf'*).
In Wikifunctions, we could choose to have individual enumerations for each language, which would have the advantage allowing for simpler, but different user interfaces for each language, where we display only the features relevant to a given language: so that for English we don’t show the other number classes, or ask for grammatical features which are not relevant for the language.
There are several different solutions, and the following list is not even exhaustive:
- a single enumeration of all grammatical features, as in Wikidata
- shared enumerations for the groups of grammatical features, just as
cases, numbers, etc. 3. enumerations for the groups of grammatical features that actually appear for languages, i.e. one shared enumeration for all languages that use only the singular and plural 4. one enumeration for each language, and for each group of grammatical feature
We have been discussing this question in the Natural Language Generation Special Interest Group. Another solution that was mentioned was to use sub-typing, for example to have “English numbers” be a subtype of the “Grammatical numbers” type, with shared elements. But without Wikifunctions having support for sub-types, this isn’t currently an option.
It is very likely that we won’t be able to resolve this issue fully until we have actually built it and found in practice how it works out. It might even be that we change some of the decisions later, as we discover patterns of Wikifunctions usage that make Wikifunctions friendlier and easier to use. But it would be good to start thinking about what we would like to aim for and what the principles are along which we align our design decisions. Recent Changes in software
The big piece of work that we landed this week was a comprehensive re-build of the front-end code for how we show labels. As is standard with MediaWiki-based tools, if your language was set to French, but there wasn't a label in French and was one in English, we show the fallback English label. Previously, we were resolving the label into a string based on your view language and displaying that, which mostly worked, but meant that the label would not be hinted for language or directionality when they were different from the context. We now fully pass down the label's language, and thus directionality, in all places (T343464 https://phabricator.wikimedia.org/T343464, T342661 https://phabricator.wikimedia.org/T342661). In the future, we may adjust how fallback labels like this are displayed, possibly to explain inline what language is being shown, and/or give a call-to-action to translate the label; you can add ideas to the top-level task (T343460 https://phabricator.wikimedia.org/T343460).
Alongside the above work, we adjusted the function-calling API code to correctly pass along the activity tracing headers (T365053 https://phabricator.wikimedia.org/T365053), and made a few code quality improvements. Additionally, we have been investigating performance-related issues with the back-end services, and hope to have more to report soon. Function of the Week: days in month when not a leap year https://www.wikifunctions.org/wiki/File:Month_-_Knuckles_(en).svg
Are you using your knuckles https://en.wikipedia.org/wiki/Knuckle_mnemonic whenever you try to remember how many days are in a given calendar month? No need for that anymore! Welcome the function days in month when not a leap year https://www.wikifunctions.org/view/en/Z16332 (Z16332).
OK, admittedly, your knuckles may often be more readily available than access to Wikifunctions, but let’s ignore that wrinkle for a moment.
The new Function takes a Gregorian calendar month https://www.wikifunctions.org/view/en/Z16098, which we have introduced as a new Type last week https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-05-22, and returns a natural number https://www.wikifunctions.org/view/en/Z13518, depending on how many days that month has in a year that is not a leap year (a complementary function for leap years https://www.wikifunctions.org/view/en/Z16332 also exists).
The function has twelve Tests, one for each month (making it a completely covered Function), and currently two Implementations, both in Python:
- One Implementation using a lookup in a Python dictionary
https://www.wikifunctions.org/view/en/Z16347, where for each month number we have the number of days,
- One Implementation using an unusual formula
https://www.wikifunctions.org/view/en/Z16329 that I’ve never seen before
One nice thing with a good, or even complete test coverage, like in this case, is that you don’t even have to understand or prove the formula in order to trust it (although that sure doesn’t hurt): you can simply check that all test cases are what you would expect, and that they pass (as they do).
Thanks to the community for so swiftly adopting the new Type, and for having created more than a dozen new Functions using the new Type. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...