Newsletter #177: Our goal for this Quarter: Agreement - Abstract-Wikipedia

25 Oct 2024


      The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-25
--
Our goal for this Quarter: Agreement
Two weeks ago, we previewed the first form of access
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11 to
knowledge on Wikidata, and last week we announced that it has gone live
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17#Function_of_the_Week:_select_representation_from_lexeme.
This week we want to sketch out what we are aiming for by the end of the
year.
https://www.wikifunctions.org/wiki/File:Wikidata_lexemes_logo.svg
As we have pointed out in the last two weeks, there were a number of issues
we are working on to improve access to Lexemes and other entities on
Wikidata, from issues with the selector for Lexemes
https://phabricator.wikimedia.org/T377540 (which has already been fixed)
to a better selector and display of Wikidata items (which we are still
working on). Thanks to everybody who has given us feedback and pointed out
further issues, or helped us prioritize the tasks.
But what is the goal for this year? What are we building towards, where do
we want to be by the end of 2024?
The goal is to be able to *build up phrases from Lexemes using linguistic
agreement*. What does that mean?
Many languages require agreement
https://en.wikipedia.org/wiki/Agreement_(linguistics) in order to be
correct (some languages do not, such as Japanese, some need a little, such
as English, and some need a lot, such as Swahili). Agreement, or concord,
means that one word or phrase has to change in order to fit another word or
phrase in a given sentence. Let’s take a look at an English example:
*“Laura ate an apple.”* vs *“Laura ate two apples.”*
In the first sentence, the word *“an”* requires to be followed by the
singular, whereas in the second sentence the word *“two”* requires to be
followed by the plural. So the first sentence has the word form *“apple”*,
and the second sentence the word form *“apples”*.
Many languages such as Italian, Hindi, or Ukrainian have grammatical
genders for nouns, such as for their respective words for cat: in Italian,
*gatto* https://www.wikidata.org/wiki/Lexeme:L5577 is masculine, in
Hindi, बिल्ली https://www.wikidata.org/wiki/Lexeme:L594403 is feminine,
and so is the Ukrainian кішка https://www.wikidata.org/wiki/Lexeme:L184954.
If a noun is being described by an adjective, the adjective in these
languages has to agree with the gender of the noun. So, if we want to
express little cat in Italian, we would say:
*“piccolo gatto”*
Turtle in Italian is *tartaruga*
https://www.wikidata.org/wiki/Lexeme:L684140, which is a feminine noun.
If we want to express little turtle in Italian, we would say:
*“piccola tartaruga”*
Note the different ending on the adjective: it is *piccolo* for masculine
nouns, and *piccola* for feminine nouns.
Assume a function that takes two arguments, both Lexemes, one an Italian
noun, the other an Italian adjective. In Italian, the adjective usually
just precedes the noun. But in order to choose the right form, we need to
know the grammatical gender of the noun. In Wikidata, there is a property
for grammatical gender https://www.wikidata.org/wiki/Property:P5185.
Before the end of the year, we plan to enable you to run a function in
Wikifunctions on an Italian noun, and get back the value for the
grammatical gender of that noun, if it is given in Italian.
With the value for grammatical gender, you will then be able to filter the
adjective in order to pick the right form. Once we have the right form of
the adjective and the noun, we can concatenate the two with a space in
between, and get a grammatically correct phrase with an adjective and a
noun.
We are looking forward to offering you these capabilities and to see what
you will build with that.
Function of the Week: plural form of lexeme as monolingual text
Since Lexemes are new to Wikifunctions, we will look this week at one of
the brand new community-created functions for Lexemes: plural form of
lexeme as monolingual text
https://www.wikifunctions.org/view/en/Z19260 (Z19260).
You can go to that function, select a Lexeme, and run the function, and it
will return the first form on that Lexeme that is a plural.
For example, enter the English noun *goose*, and it returns *geese* in
English, enter the Spanish noun *compás* and it returns *compases* in
Spanish. This function should work on every language, and always return a
correct form, as long as it is in Wikidata (and if it is missing in
Wikidata, feel free to enter it).
The function takes one argument of type Wikidata Lexeme
https://www.wikifunctions.org/view/en/Z6005 and returns a monolingual text
https://www.wikifunctions.org/view/en/Z11 (that is, a text in a specific
language).
There are two tests written for this function: a plural of *dog* being
*dogs* https://www.wikifunctions.org/view/en/Z19262, and a plural of
*amigo* being *amigos* https://www.wikifunctions.org/view/en/Z19263. We
have the same issues with tests like last week: the tests depend as much on
Wikidata as they do on Wikifunctions. The second test illustrates that
well: it so happens that on the Lexeme for the Spanish noun *amigo*
https://www.wikidata.org/wiki/Lexeme:L230374 the form *amigos* is listed
before the form *amigas*, but both of them are correct plural forms, the
former being masculine and the latter feminine. The forms could have been
written the other way around just as well.
The function has one implementation, using a composition
https://www.wikifunctions.org/view/en/Z19261. We will read the
composition from the inside to the outside.
1. First, we call select lexeme forms from lexeme
   https://www.wikifunctions.org/view/en/Z19243, with the lexeme in the
   argument and with the plural
   https://www.wikifunctions.org/w/index.php?title=Q146786&action=edit&redlink=1
(in
   a list) as the second argument. This call filters the forms of the lexeme,
   only leaving the forms which have plural as a grammatical feature.
   2. We echo https://www.wikifunctions.org/view/en/Z801 the result,
   which shouldn’t do anything, but fixes issues with typed lists. We hope to
   get rid of this step in the future.
   3. Then we get the first element
   https://www.wikifunctions.org/view/en/Z811 of the list which has been
   returned. This means we are usually getting *a* plural form back, not
   *the* plural form: whatever happens to be the first on the Lexeme.
   4. At this point we have a Wikidata Lexeme Form
   https://www.wikifunctions.org/view/en/Z6004 at hand. Using value by key
   https://www.wikifunctions.org/view/en/Z803 we ask for the
   *representations* of the form, which returns a multilingual text
   https://www.wikifunctions.org/view/en/Z12.
   5. And finally, we can use the function to get the first monolingual
   text from a multilingual text
   https://www.wikifunctions.org/view/en/Z19254 in order to get to
the monolingual
   text https://www.wikifunctions.org/view/en/Z11 we are looking for.
Currently, the function fails frequently, due to time outs when resolving
larger objects and evaluating more complex compositions timing out
frequently (for example, it times out on a German noun such as *Baum*
https://www.wikidata.org/wiki/Lexeme:L11540). Also, the call to echo
shouldn’t be necessary. We can use this function as a benchmark on
improving the capabilities and robustness of Wikifunctions. And at the same
time, when it works, it demonstrates a really interesting use case.