The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-25 -- Our goal for this Quarter: Agreement
Two weeks ago, we previewed the first form of access https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11 to knowledge on Wikidata, and last week we announced that it has gone live https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17#Function_of_the_Week:_select_representation_from_lexeme. This week we want to sketch out what we are aiming for by the end of the year. https://www.wikifunctions.org/wiki/File:Wikidata_lexemes_logo.svg
As we have pointed out in the last two weeks, there were a number of issues we are working on to improve access to Lexemes and other entities on Wikidata, from issues with the selector for Lexemes https://phabricator.wikimedia.org/T377540 (which has already been fixed) to a better selector and display of Wikidata items (which we are still working on). Thanks to everybody who has given us feedback and pointed out further issues, or helped us prioritize the tasks.
But what is the goal for this year? What are we building towards, where do we want to be by the end of 2024?
The goal is to be able to *build up phrases from Lexemes using linguistic agreement*. What does that mean?
Many languages require agreement https://en.wikipedia.org/wiki/Agreement_(linguistics) in order to be correct (some languages do not, such as Japanese, some need a little, such as English, and some need a lot, such as Swahili). Agreement, or concord, means that one word or phrase has to change in order to fit another word or phrase in a given sentence. Let’s take a look at an English example: *“Laura ate an apple.”* vs *“Laura ate two apples.”*
In the first sentence, the word *“an”* requires to be followed by the singular, whereas in the second sentence the word *“two”* requires to be followed by the plural. So the first sentence has the word form *“apple”*, and the second sentence the word form *“apples”*.
Many languages such as Italian, Hindi, or Ukrainian have grammatical genders for nouns, such as for their respective words for cat: in Italian, *gatto* https://www.wikidata.org/wiki/Lexeme:L5577 is masculine, in Hindi, बिल्ली https://www.wikidata.org/wiki/Lexeme:L594403 is feminine, and so is the Ukrainian кішка https://www.wikidata.org/wiki/Lexeme:L184954. If a noun is being described by an adjective, the adjective in these languages has to agree with the gender of the noun. So, if we want to express little cat in Italian, we would say: *“piccolo gatto”*
Turtle in Italian is *tartaruga* https://www.wikidata.org/wiki/Lexeme:L684140, which is a feminine noun. If we want to express little turtle in Italian, we would say: *“piccola tartaruga”*
Note the different ending on the adjective: it is *piccolo* for masculine nouns, and *piccola* for feminine nouns.
Assume a function that takes two arguments, both Lexemes, one an Italian noun, the other an Italian adjective. In Italian, the adjective usually just precedes the noun. But in order to choose the right form, we need to know the grammatical gender of the noun. In Wikidata, there is a property for grammatical gender https://www.wikidata.org/wiki/Property:P5185. Before the end of the year, we plan to enable you to run a function in Wikifunctions on an Italian noun, and get back the value for the grammatical gender of that noun, if it is given in Italian.
With the value for grammatical gender, you will then be able to filter the adjective in order to pick the right form. Once we have the right form of the adjective and the noun, we can concatenate the two with a space in between, and get a grammatically correct phrase with an adjective and a noun.
We are looking forward to offering you these capabilities and to see what you will build with that. Function of the Week: plural form of lexeme as monolingual text
Since Lexemes are new to Wikifunctions, we will look this week at one of the brand new community-created functions for Lexemes: plural form of lexeme as monolingual text https://www.wikifunctions.org/view/en/Z19260 (Z19260). You can go to that function, select a Lexeme, and run the function, and it will return the first form on that Lexeme that is a plural.
For example, enter the English noun *goose*, and it returns *geese* in English, enter the Spanish noun *compás* and it returns *compases* in Spanish. This function should work on every language, and always return a correct form, as long as it is in Wikidata (and if it is missing in Wikidata, feel free to enter it).
The function takes one argument of type Wikidata Lexeme https://www.wikifunctions.org/view/en/Z6005 and returns a monolingual text https://www.wikifunctions.org/view/en/Z11 (that is, a text in a specific language).
There are two tests written for this function: a plural of *dog* being *dogs* https://www.wikifunctions.org/view/en/Z19262, and a plural of *amigo* being *amigos* https://www.wikifunctions.org/view/en/Z19263. We have the same issues with tests like last week: the tests depend as much on Wikidata as they do on Wikifunctions. The second test illustrates that well: it so happens that on the Lexeme for the Spanish noun *amigo* https://www.wikidata.org/wiki/Lexeme:L230374 the form *amigos* is listed before the form *amigas*, but both of them are correct plural forms, the former being masculine and the latter feminine. The forms could have been written the other way around just as well.
The function has one implementation, using a composition https://www.wikifunctions.org/view/en/Z19261. We will read the composition from the inside to the outside.
1. First, we call select lexeme forms from lexeme https://www.wikifunctions.org/view/en/Z19243, with the lexeme in the argument and with the plural https://www.wikifunctions.org/w/index.php?title=Q146786&action=edit&redlink=1 (in a list) as the second argument. This call filters the forms of the lexeme, only leaving the forms which have plural as a grammatical feature. 2. We echo https://www.wikifunctions.org/view/en/Z801 the result, which shouldn’t do anything, but fixes issues with typed lists. We hope to get rid of this step in the future. 3. Then we get the first element https://www.wikifunctions.org/view/en/Z811 of the list which has been returned. This means we are usually getting *a* plural form back, not *the* plural form: whatever happens to be the first on the Lexeme. 4. At this point we have a Wikidata Lexeme Form https://www.wikifunctions.org/view/en/Z6004 at hand. Using value by key https://www.wikifunctions.org/view/en/Z803 we ask for the *representations* of the form, which returns a multilingual text https://www.wikifunctions.org/view/en/Z12. 5. And finally, we can use the function to get the first monolingual text from a multilingual text https://www.wikifunctions.org/view/en/Z19254 in order to get to the monolingual text https://www.wikifunctions.org/view/en/Z11 we are looking for.
Currently, the function fails frequently, due to time outs when resolving larger objects and evaluating more complex compositions timing out frequently (for example, it times out on a German noun such as *Baum* https://www.wikidata.org/wiki/Lexeme:L11540). Also, the call to echo shouldn’t be necessary. We can use this function as a benchmark on improving the capabilities and robustness of Wikifunctions. And at the same time, when it works, it demonstrates a really interesting use case.
abstract-wikipedia@lists.wikimedia.org