Abstract-Wikipedia October 2024

abstract-wikipedia@lists.wikimedia.org

2 participants
5 discussions

Newsletter #177: Our goal for this Quarter: Agreement
by Denny Vrandečić 25 Oct '24

25 Oct '24

The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-25 -- Our goal for this Quarter: Agreement Two weeks ago, we previewed the first form of access <https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11> to knowledge on Wikidata, and last week we announced that it has gone live <https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17#…>. This week we want to sketch out what we are aiming for by the end of the year. <https://www.wikifunctions.org/wiki/File:Wikidata_lexemes_logo.svg> As we have pointed out in the last two weeks, there were a number of issues we are working on to improve access to Lexemes and other entities on Wikidata, from issues with the selector for Lexemes <https://phabricator.wikimedia.org/T377540> (which has already been fixed) to a better selector and display of Wikidata items (which we are still working on). Thanks to everybody who has given us feedback and pointed out further issues, or helped us prioritize the tasks. But what is the goal for this year? What are we building towards, where do we want to be by the end of 2024? The goal is to be able to *build up phrases from Lexemes using linguistic agreement*. What does that mean? Many languages require agreement <https://en.wikipedia.org/wiki/Agreement_(linguistics)> in order to be correct (some languages do not, such as Japanese, some need a little, such as English, and some need a lot, such as Swahili). Agreement, or concord, means that one word or phrase has to change in order to fit another word or phrase in a given sentence. Let’s take a look at an English example: *“Laura ate an apple.”* vs *“Laura ate two apples.”* In the first sentence, the word *“an”* requires to be followed by the singular, whereas in the second sentence the word *“two”* requires to be followed by the plural. So the first sentence has the word form *“apple”*, and the second sentence the word form *“apples”*. Many languages such as Italian, Hindi, or Ukrainian have grammatical genders for nouns, such as for their respective words for cat: in Italian, *gatto* <https://www.wikidata.org/wiki/Lexeme:L5577> is masculine, in Hindi, बिल्ली <https://www.wikidata.org/wiki/Lexeme:L594403> is feminine, and so is the Ukrainian кішка <https://www.wikidata.org/wiki/Lexeme:L184954>. If a noun is being described by an adjective, the adjective in these languages has to agree with the gender of the noun. So, if we want to express little cat in Italian, we would say: *“piccolo gatto”* Turtle in Italian is *tartaruga* <https://www.wikidata.org/wiki/Lexeme:L684140>, which is a feminine noun. If we want to express little turtle in Italian, we would say: *“piccola tartaruga”* Note the different ending on the adjective: it is *piccolo* for masculine nouns, and *piccola* for feminine nouns. Assume a function that takes two arguments, both Lexemes, one an Italian noun, the other an Italian adjective. In Italian, the adjective usually just precedes the noun. But in order to choose the right form, we need to know the grammatical gender of the noun. In Wikidata, there is a property for grammatical gender <https://www.wikidata.org/wiki/Property:P5185>. Before the end of the year, we plan to enable you to run a function in Wikifunctions on an Italian noun, and get back the value for the grammatical gender of that noun, if it is given in Italian. With the value for grammatical gender, you will then be able to filter the adjective in order to pick the right form. Once we have the right form of the adjective and the noun, we can concatenate the two with a space in between, and get a grammatically correct phrase with an adjective and a noun. We are looking forward to offering you these capabilities and to see what you will build with that. Function of the Week: plural form of lexeme as monolingual text Since Lexemes are new to Wikifunctions, we will look this week at one of the brand new community-created functions for Lexemes: plural form of lexeme as monolingual text <https://www.wikifunctions.org/view/en/Z19260> (Z19260). You can go to that function, select a Lexeme, and run the function, and it will return the first form on that Lexeme that is a plural. For example, enter the English noun *goose*, and it returns *geese* in English, enter the Spanish noun *compás* and it returns *compases* in Spanish. This function should work on every language, and always return a correct form, as long as it is in Wikidata (and if it is missing in Wikidata, feel free to enter it). The function takes one argument of type Wikidata Lexeme <https://www.wikifunctions.org/view/en/Z6005> and returns a monolingual text <https://www.wikifunctions.org/view/en/Z11> (that is, a text in a specific language). There are two tests written for this function: a plural of *dog* being *dogs* <https://www.wikifunctions.org/view/en/Z19262>, and a plural of *amigo* being *amigos* <https://www.wikifunctions.org/view/en/Z19263>. We have the same issues with tests like last week: the tests depend as much on Wikidata as they do on Wikifunctions. The second test illustrates that well: it so happens that on the Lexeme for the Spanish noun *amigo* <https://www.wikidata.org/wiki/Lexeme:L230374> the form *amigos* is listed before the form *amigas*, but both of them are correct plural forms, the former being masculine and the latter feminine. The forms could have been written the other way around just as well. The function has one implementation, using a composition <https://www.wikifunctions.org/view/en/Z19261>. We will read the composition from the inside to the outside. 1. First, we call select lexeme forms from lexeme <https://www.wikifunctions.org/view/en/Z19243>, with the lexeme in the argument and with the plural <https://www.wikifunctions.org/w/index.php?title=Q146786&action=edit&redlink…> (in a list) as the second argument. This call filters the forms of the lexeme, only leaving the forms which have plural as a grammatical feature. 2. We echo <https://www.wikifunctions.org/view/en/Z801> the result, which shouldn’t do anything, but fixes issues with typed lists. We hope to get rid of this step in the future. 3. Then we get the first element <https://www.wikifunctions.org/view/en/Z811> of the list which has been returned. This means we are usually getting *a* plural form back, not *the* plural form: whatever happens to be the first on the Lexeme. 4. At this point we have a Wikidata Lexeme Form <https://www.wikifunctions.org/view/en/Z6004> at hand. Using value by key <https://www.wikifunctions.org/view/en/Z803> we ask for the *representations* of the form, which returns a multilingual text <https://www.wikifunctions.org/view/en/Z12>. 5. And finally, we can use the function to get the first monolingual text from a multilingual text <https://www.wikifunctions.org/view/en/Z19254> in order to get to the monolingual text <https://www.wikifunctions.org/view/en/Z11> we are looking for. Currently, the function fails frequently, due to time outs when resolving larger objects and evaluating more complex compositions timing out frequently (for example, it times out on a German noun such as *Baum* <https://www.wikidata.org/wiki/Lexeme:L11540>). Also, the call to echo shouldn’t be necessary. We can use this function as a benchmark on improving the capabilities and robustness of Wikifunctions. And at the same time, when it works, it demonstrates a really interesting use case.

1 0

Newsletter 176: What could abstract content look like?
by Denny Vrandečić 17 Oct '24

17 Oct '24

The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17 Because of the more complex formatting, the on-wiki version might be easier to read. -- What could abstract content look like? *This week’s newsletter is guest-written by Mahir Morshed <https://www.wikifunctions.org/wiki/User:Mahir256>.* The notion of ‘abstract content’ for Abstract Wikipedia arises by analogy to regular content on regular Wikipedias. This regular content is written in a specific language’s writing system and, on the surface, is not clearly connected to the structured information on Wikidata. By contrast, then, abstract content should not be tied to a specific language’s writing system and should instead be derived from information on Wikidata. It would additionally be useful for the parts of this content to have a simplified syntax, both to reduce the logic needed to process and manipulate this content and to ensure additions to the content don’t inherently require changes to the representation format. It remains then to speak of how this abstract content should appear such that these desiderata are achieved. Let’s try to arrive at such a representation through some changes to a Constructor for a simple sentence, starting with something similarly structured to Figure 1 in Denny’s CACM paper <https://dl.acm.org/doi/10.1145/3425778>: Action( predicate: eating, eater: Robert J. Jones, eaten: ice cream, location: Decatur, Illinois, time: 1 July 2023, 11:30am ) The intended meaning of this sentence is “Robert J. Jones ate ice cream in Decatur, Illinois on July 1st, 2023 at 11:30am.” Right now everything in the Constructor is in English, and none of the arguments refer to Wikidata at all. Let’s (mostly) fix the latter of these problems: Action( predicate: Q213449 <https://www.wikidata.org/wiki/Q213449>, eater: Q33103898 <https://www.wikidata.org/wiki/Q33103898>, eaten: Q13233 <https://www.wikidata.org/wiki/Q13233>, location: Q506325 <https://www.wikidata.org/wiki/Q506325>, time: “+2023-07-01T16:30:00Z” ) This is better, but the name of the Constructor and the names of the arguments are still in English. What if we used Wikidata items to represent these as well? Q4026292 <https://www.wikidata.org/wiki/Q4026292>( Q179080 <https://www.wikidata.org/wiki/Q179080>: Q213449 <https://www.wikidata.org/wiki/Q213449>, Q20984678 <https://www.wikidata.org/wiki/Q20984678>: Q33103898 <https://www.wikidata.org/wiki/Q33103898>, Q2095 <https://www.wikidata.org/wiki/Q2095>: Q13233 <https://www.wikidata.org/wiki/Q13233>, Q115095765 <https://www.wikidata.org/wiki/Q115095765>: Q506325 <https://www.wikidata.org/wiki/Q506325>, Q7805404 <https://www.wikidata.org/wiki/Q7805404>: +2023-07-01T16:30:00Z ) Now that nearly everything in this Constructor is represented by a Wikidata QID, it can be displayed entirely in a particular language provided that each item referred to has a label in that language, such as Bengali: কার্য( বিধেয়: খাওয়া, ভোক্তা: রবার্ট জে জোন্স, খাদ্য: আইসক্রিম, অবস্থান: ডেকেটার, ইলিনয়, ঘটনার সময়: +2023-07-01T16:30:00Z ) We’re still not done, though: could we simplify this syntax a bit? (Can we get away from needing named arguments to functions?) Q4026292 <https://www.wikidata.org/wiki/Q4026292>( Q179080 <https://www.wikidata.org/wiki/Q179080>(Q213449 <https://www.wikidata.org/wiki/Q213449>), Q20984678 <https://www.wikidata.org/wiki/Q20984678>(Q33103898 <https://www.wikidata.org/wiki/Q33103898>), Q2095 <https://www.wikidata.org/wiki/Q2095>(Q13233 <https://www.wikidata.org/wiki/Q13233>), Q115095765 <https://www.wikidata.org/wiki/Q115095765>(Q506325 <https://www.wikidata.org/wiki/Q506325>), Q7805404 <https://www.wikidata.org/wiki/Q7805404>(+2023-07-01T16:30:00Z) ) This change, from using named function arguments to using single-member functions as unnamed arguments, should hopefully remind one of the composition syntax <https://www.wikifunctions.org/wiki/Wikifunctions:How_to_create_implementati…> that Wikifunctions functions can be implemented in. Since different predicates require different participant roles–’drinking’ requires ‘drinker’ and ‘drink’, ‘reading’ requires ‘reader’ and ‘thing being read’, and so on–the number of functions that need to be introduced at this point will likely skyrocket. We can reduce this number by generalizing them to use Q613930 <https://www.wikidata.org/wiki/Q613930> to indicate participant roles, keeping the QIDs we introduced for those roles as arguments instead: Q4026292 <https://www.wikidata.org/wiki/Q4026292>( Q179080 <https://www.wikidata.org/wiki/Q179080>(Q213449 <https://www.wikidata.org/wiki/Q213449>), Q613930 <https://www.wikidata.org/wiki/Q613930>(Q20984678 <https://www.wikidata.org/wiki/Q20984678>, Q33103898 <https://www.wikidata.org/wiki/Q33103898>), Q613930 <https://www.wikidata.org/wiki/Q613930>(Q2095 <https://www.wikidata.org/wiki/Q2095>, Q13233 <https://www.wikidata.org/wiki/Q13233>), Q115095765 <https://www.wikidata.org/wiki/Q115095765>(Q506325 <https://www.wikidata.org/wiki/Q506325>), Q7805404 <https://www.wikidata.org/wiki/Q7805404>(+2023-07-01T16:30:00Z) ) The connection to particular programming languages can be made even more explicit with a little rearrangement: (“Q4026292” (“Q179080” “Q213449”) (“Q613930” “Q20984678” “Q33103898”) (“Q613930” “Q2095” “Q13233”) (“Q115095765” “Q506325”) (“Q7805404” “+2023-07-01T16:30:00Z”) ) This format, borrowing from the syntax of Lisp <https://en.wikipedia.org/wiki/Lisp>-like programming languages, is what I believe should be used to store abstract content for Abstract Wikipedia. As a purely optional last measure for completeness, let’s try to turn the timestamp into QIDs, using items for the date, time, and time zone: (“Q4026292” (“Q179080” “Q213449”) (“Q613930” “Q20984678” “Q33103898”) (“Q613930” “Q2095” “Q13233”) (“Q115095765” “Q506325”) (“Q7805404” (“Q186885” “Q69306847” “Q95056915” “Q15406405”)) ) Since this final result is composed entirely of strings (if the “Q” is removed everywhere, integers?) and lists–both more primitive data structures across lots of environments–it can be read and modified the way other lists of strings are dealt with in those environments. (In fact, lists of strings can be used as the input to Wikifunctions functions <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2024-01-03>, even though actual handling of Wikidata items is still to come.) As a reminder, since each string is a Wikidata QID, this final result can be displayed in a given language provided each item has a label in that language. The Constructor whose written form we have been modifying also represents what I believe to be a very useful building block for abstract content. In many languages this would correspond to a structurally more simple sentence–albeit one whose main verb isn’t something like ‘to be’ or ‘to have’–complete with a predicate (‘eating’), participant roles (such as ‘eater’ and ‘food’), and any number of modifiers (such as ‘location’ and ‘time’). There are already lots of Wikidata items for predicates, with Wikidata verb and verb phrase lexemes linking to them <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Statistics/indi…>, and there is an emerging effort to introduce items to represent participant roles for predicates <https://www.wikidata.org/wiki/Wikidata:WikiProject_Events_and_Role_Frames>. In principle, the order of components within such a block would not be significant, so that the following would be functionally identical to what was shown above: (“Q4026292” (“Q115095765” “Q506325”) (“Q179080” “Q213449”) (“Q7805404” (“Q186885” “Q69306847” “Q95056915” “Q15406405”)) (“Q613930” “Q2095” “Q13233”) (“Q613930” “Q20984678” “Q33103898”) ) Putting these blocks together requires introducing some machinery, but with the representation we arrived at it is possible to make this machinery realizable. The following are but three possible examples: - Two simple sentences can be coordinated (e.g. using ‘and’, ‘or’, ‘but’, and so on) by adding both as arguments to a new list. The item Q13381767 <https://www.wikidata.org/wiki/Q13381767> below, for example, represents a simple ‘and’ relationship: (“Q13381767” (“Q4026292” (“Q179080” “Q213449”) [...]) (“Q4026292” (“Q179080” “Q199657”) [...]) ) - A simple sentence may be subordinated to another (e.g. using ‘because’, ‘when’, ‘while’, and so on) by introducing a modifier wrapping that simple sentence and using that modifier in the other sentence. The item Q12774849 <https://www.wikidata.org/wiki/Q12774849> below, for example, represents a simple ‘because’ relationship: (“Q4026292” (“Q179080” “Q213449”) [...] (“Q12774849” (“Q4026292” (“Q179080” “Q199657”) [...]) ) ) - Arbitrary modifiers could be applied after a simple sentence has been formed by wrapping them around that sentence. The item Q1478451 <https://www.wikidata.org/wiki/Q1478451> below, for example, represents simple negation: (“Q1478451” (“Q4026292” (“Q179080” “Q199657”) [...]) ) Much, if not all, of what has been described above has been put into practice at elemwala.toolforge.org (powered by Ninai <https://gitlab.com/mahir256/ninai/>/Udiron <https://gitlab.com/mahir256/udiron/>). *This week’s newsletter is guest-written by Mahir Morshed <https://www.wikifunctions.org/wiki/User:Mahir256>. If you want to propose a guest-written newsletter, please contact Luca <https://www.wikifunctions.org/wiki/User_talk:Sannita_(WMF)> or Denny <https://www.wikifunctions.org/wiki/User_talk:DVrandecic_(WMF)>.* Recent Changes in the software A very light set of technical changes this week, as our focus was on the longer-term Quarterly work which is still in-flight. On the front-end side, we made some follow-up fixes to the UX components for using Lexemes (T373589 <https://phabricator.wikimedia.org/T373589>), allowing you to search for single-glyph Lexemes (like '𒂼', which is L1 <https://www.wikidata.org/wiki/Lexeme:L1>) and tweaking the visual display. We also improved the request traceability headers we generate when you run a function, consolidating on the OpenTelemetry standard ones as part of wider Wikimedia observability work (T375922 <https://phabricator.wikimedia.org/T375922>). Function of the Week: select representation from lexeme As we wrote last week <https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11>, we are introducing Wikidata lexemes and first versions of other Wikidata-based types. The new types are now available, and in order to demonstrate the new types and how they work, we have created a first set of functions: 1. count lexeme forms in lexeme <https://www.wikifunctions.org/view/en/Z19232> 2. count matching lexeme forms in lexeme <https://www.wikifunctions.org/view/en/Z19234> 3. select representation from lexeme <https://www.wikifunctions.org/view/en/Z19241> 4. select matching lexeme forms in lexeme <https://www.wikifunctions.org/view/en/Z19243> All of these functions use the new Wikidata lexeme <https://www.wikifunctions.org/view/en/Z6005> type for their first argument. When you go to one of these functions, our UI provides a lexeme selector that helps you to pick a lexeme from Wikidata that matches the word that you type. After hitting run, your selected lexeme is retrieved from Wikidata and transformed into our Wikidata lexeme type (by a preparatory call to the new builtin fetch Wikidata lexeme <https://www.wikifunctions.org/view/en/Z6825> function) and then passed into the selected function above. Let’s take a closer look at one of these new functions: select representation from lexeme <https://www.wikifunctions.org/view/en/Z19241>. That function also has a second argument, grammatical features, which is a list <https://www.wikifunctions.org/view/en/Z881> of Wikidata item references <https://www.wikifunctions.org/view/en/Z6091>. Currently, we don't have a UI component for selecting Wikidata items yet, but that is part of our upcoming work in this quarter. However, you can copy and paste a QID for grammatical features from Wikidata. When you specify one or more grammatical features, those are used to select the lexeme form(s) from the lexeme which have those grammatical features. Let’s take a look at a simple example: we want to obtain the (first) plural form of the English noun "goose" <https://www.wikidata.org/wiki/Lexeme:L6424>. We type "goose" in the Lexeme selector, and click on the "English, noun" choice. In the second argument, we click on the "+" button and type in Q146786, the QID for plural <https://www.wikidata.org/wiki/Q146786>. Then we click “Run function” and we should get back the plural form. That is also the first test <https://www.wikifunctions.org/view/en/Z19258> for the function. A second test <https://www.wikifunctions.org/view/en/Z19259> checks that the plural <https://www.wikidata.org/wiki/Q146786> nominative <https://www.wikidata.org/wiki/Q131105> of the Malayalam word ആപ്പിൾ <https://www.wikidata.org/wiki/Lexeme:L455955> (with one meaning being apple) is ആപ്പിളുകൾ. This test is to check a different script and a more complex lexeme. In general, it can be difficult to write tests for some of these functions, as they rely on a certain stability of Wikidata, and when writing tests we should make a thoughtful decision about what exactly we are testing with a given test. The function currently has one implementation <https://www.wikifunctions.org/view/en/Z19242> written in JavaScript. The implementation can be inspected and used as a pattern for other implementations. But this function is implemented entirely in the contributor space (unlike the fetch Wikidata lexeme <https://www.wikifunctions.org/view/en/Z6825> function, which has a magical builtin implementation <https://www.wikifunctions.org/view/en/Z6925> and certainly does things that contributors cannot do). Here is another example on how to use these new functions: if you want to examine the lexeme forms from a lexeme, use select matching lexeme forms in lexeme <https://www.wikifunctions.org/view/en/Z19243>. Type some word into the Lexeme selector and choose one of the options it offers. If you now leave the second argument as the empty list, you will get back all of the Lexeme forms from the selected Lexeme. Then you can browse them in WIkifunctions Note that we currently have a few bugs: If there are two or more choices displayed with the exact same word form, the first of them will always be selected, no matter which one you click on. Also, larger Lexemes cause a gateway timeout on loading. And, just with selecting QIDs, we also don’t have a proper display for QIDs yet. If you encounter further issues, please let us know.

1 0

Newsletter #175: Wikidata Lexemes in Wikifunctions are coming soon
by Denny Vrandečić 11 Oct '24

11 Oct '24

The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11 -- Wikidata Lexemes in Wikifunctions are coming soon! Wikidata famously contains a large knowledge graph about more than a hundred million items, but it also has a younger, less known side: lexicographical data <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data>. Currently, Wikidata describes more than 1.3 Million lexical items across 1291 languages. The lexicographic data in Wikidata is an essential ingredient for the Abstract Wikipedia vision. As an early step on this road, support for Lexemes is coming to Wikifunctions very soon! And we want to give a small preview of that. We are introducing a number of new Types, and each of those will come in two flavors. Let us look at Wikidata Lexemes <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation#D…> for an example. We have introduced two Types to handle them: the Wikidata Lexeme <https://www.wikifunctions.org/view/en/Z6005> itself, and the Wikidata Lexeme Reference <https://www.wikifunctions.org/view/en/Z6095>. The reference is a wrapper around the Lexeme ID <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation#L…>. A Function will be provided that takes a Wikidata Lexeme Reference and returns the respective Wikidata Lexeme. An instance of the Wikidata Lexeme Type represents a Lexeme from Wikidata. This means that we will not be able to create Lexemes in Wikifunctions on the fly, or modify them: if you want to change or create a Lexeme, you will continue to do so in Wikidata. We have extended the Wikifunctions user interface to work with the new Types. For Lexemes, there is a built-in search interface that will allow you to search for and select a Lexeme in order to use it as an argument in a function call. There will be numerous limitations initially; particularly, Statements will be very incomplete. Any statement that has a Type that is not supported (which for now is almost all of them) will be silently dropped. We will, over time, increase the covered Value Types from Wikidata, with the eventual goal to represent Wikidata fully. One very important restriction is that you won’t be initially able to select Wikidata entities through incoming Statements. This is a very notable restriction: it does not allow us to take, e.g. the item for dog <https://www.wikidata.org/wiki/Q144> as an argument and then, using a function, follow the *item for this sense* <https://www.wikidata.org/wiki/Property:P5137> statement on the first sense <https://www.wikidata.org/wiki/Lexeme:L1122#S1> of the Lexeme *dog* <https://www.wikidata.org/wiki/Lexeme:L1122>, in order to pick the relevant Lexeme. As this is a very important use case for the Abstract Wikipedia story, we will be working on resolving this swiftly. Lexeme access will be a major new capability with many moving parts, and there is a good chance that we will need more documentation, that some workflows will initially be unclear, and also that some things might be broken at the beginning. We ask for your patience with us so we can improve it, but we also will ask for your feedback so we know what to improve. We are excited to get this launched! Recent Changes in the software Most of our work over the past two weeks has been on the new Quarterly work, including the Wikidata access discussed above, and on "Fix It" work to pay down our technical debt. We also landed a few fixes this week: We've adjusted the code that picks your interface language when clicking links on Wikifunctions to also respect your account language preference, if set (T374309 <https://phabricator.wikimedia.org/T374309>). Thanks to User:Ameisenigel <https://www.wikifunctions.org/wiki/User:Ameisenigel> for finding and reporting the issue! As part of wider work to remove raw HTML interface messages across MediaWiki, we replaced the site copyright message written by Legal that appears in the footer with ones that are in wikitext (T375882 <https://phabricator.wikimedia.org/T375882>). We've re-written the build process for our back-end evaluator service to be simpler and faster through Docker layer caching, and by loosening the load stress-test job (T376053 <https://phabricator.wikimedia.org/T376053>). We've dropped an unused method for loading HTML content from a wiki that we inherited from the "service-template-node" template that was causing confusion (T366733 <https://phabricator.wikimedia.org/T366733>). We've improved the way we include our utilities in the back-end for less code duplication (T347086 <https://phabricator.wikimedia.org/T347086>). We have added some better metrics and logging for our monitoring of the back-end services (T376225 <https://phabricator.wikimedia.org/T376225>, T375457 <https://phabricator.wikimedia.org/T375457>). We, along with all Wikimedia-deployed code, are now using the latest version of the Codex UX library, v1.13.1, as of this week. We believe that there should be no user-visible changes on Wikifunctions, so please comment on the Project chat or file a Phabricator task if you spot an issue. Recording of Volunteers’ Corner <https://www.wikifunctions.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…> The recording of this month’s Volunteers' Corner is now available on Commons <https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…> . Function of the Week: English plural possessive Given that we are getting close on supporting Lexemes, this week’s Function of the Week will be about the plural possessive in English. It is also the Function we have built together in this week’s Volunteer’s Corner. So if you want to see that Function being created, there’s a video on Commons <https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…> ! In English, nouns usually have a singular and a plural form. The singular is used when we talk about one instance of the noun, and the plural when we talk about multiples. The possessive <https://en.wikipedia.org/wiki/English_possessive> is used when we want to express that there’s something that belongs to it. So we may say there is *one dog*, there are *two dogs*, and this is *the dog’s house*. The singular is *dog*, the plural is *dogs*, and the singular possessive is *dog’s*. Combining these, we get the plural possessive: *the dogs’ barking* would refer to barking done by several dogs. Of the 30,599 English nouns in Wikidata, the vast majority (28,038) have two forms, but five Lexemes also feature possessive or genitive forms. They are rarely specifically listed (e.g. on sport <https://www.wikidata.org/wiki/Lexeme:L301>), because they are almost always regular, given the singular and plural forms. Regular forms are great for functions! The English plural possessive <https://www.wikifunctions.org/view/en/Z19125> function takes the lemma, i.e. the singular form, and returns the plural possessive. There is one implementation <https://www.wikifunctions.org/view/en/Z19129>, which is a composition: it first generates the plural <https://www.wikifunctions.org/view/en/Z11089> out of the lemma, and then creates the possessive <https://www.wikifunctions.org/view/en/Z11302> out of the plural. I think this Function is a good example of a Function that probably doesn’t need any further implementations: the only other way to implement it would be to redo the two functions that are used in the composition, and there seems no benefit in that. The Function has five tests, of which three are connected – the other two are left for discussion, whether they should be connected or not: - volunteer to volunteers’ <https://www.wikifunctions.org/view/en/Z19128> (in order to honor the Volunteers’ Corner) - kiss to kisses’ <https://www.wikifunctions.org/view/en/Z19131> (a slightly more complex pluralization) - dog to dogs’ <https://www.wikifunctions.org/view/en/Z19126> - fish to fish’s <https://www.wikifunctions.org/view/en/Z19127> (not connected, and it fails with the current implementation) - Matrix to Matrices’ <https://www.wikifunctions.org/view/en/Z19130> (not connected either) One main point is to decide whether this Function should always return the correct plural possessive, in which case the unconnected tests should be connected, or just regular plural possessives – in which case these tests shouldn’t be there. As always, this was a fun exercise, and I want to thank the Volunteers who showed up and helped us in building the Function.

1 0

Lojban language
by Andy 09 Oct '24

09 Oct '24

Lojban is a very precise language. It can be used as an intermediate language in machine translation tools. for example English <-> Lojban <-> Polish manual : https://lojban.org/publications/cll/cll_v1.1_xhtml-chapter-chunks/ has corpora https://corpus.lojban.org/ but need large parallel corpora It will be good if will possible integrate Abstract WIkipedia with Lojban as any more language,

1 0

Newsletter #174: Focus topic: food
by Denny Vrandečić 02 Oct '24

02 Oct '24

The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-02 -- Focus topic: food <https://www.wikifunctions.org/wiki/File:Christmas_table_(Serbian_cuisine).j…> As we discussed two weeks ago, we are introducing two focus topics <https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-09-20>. One focus topic will concern model articles <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-07>, and one will be for bespoke articles <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-21>. We are looking for your input around a focus topic for model articles, but this time we want to discuss our chosen focus topic for bespoke articles: *food*. Why food? Articles about foods and beverages on the different language editions of Wikipedia have an enormous variety of representation. Some articles talk about culture, others about history; some articles talk about nutrition, others about preparation. <https://www.wikifunctions.org/wiki/File:Egyptian_food_Koshary.jpg> Some have infoboxes; many do not. One might think that foods can easily contain infoboxes about nutritional values, but many foods are prepared in such different ways–and exhibit so many different varieties–that it's difficult to express structured data about the food, such as nutritional values. In short, creating a template for articles about food and beverages is basically impossible. Foods and beverages also have the interesting quality that, perhaps more than any other topic area, they exhibit cultural differences in the different language articles of Wikipedia. The few articles we checked and compared across languages had sometimes vastly different content. And whereas there are many well-known (and probably even more lesser-known) contentious questions about food, the debates in this area are in general less heated than those on topics such as politics, geography, history, or religion. <https://www.wikifunctions.org/wiki/File:Goblet_of_Fire_Cocktail.jpg> Wikidata has items on about 31,000 foods and beverages, of which 26,000 have sitelinks. There are eleven topics with more than 200 sitelinks, giving an indication of their global importance: coffee (243), milk (242), tea (240), bread (239), food (228), beer (221), apple (220), wine (216), rice (213), banana (209), and honey (203) (query <https://w.wiki/BDb7>). The chart shows how many foods have how many sitelinks (note the log scale on the y axis). [image: A chart showing the number of foods with a certain number of sitelinks on Wikimedia projects as of September 2024] <https://www.wikifunctions.org/wiki/File:Log_Number_of_foods_vs._Sitelinks.s…> 326 language editions of Wikipedia have articles about foods and beverages ( query <https://w.wiki/BDcr>), showing the universal importance of food. 114 languages have an article about a dish or drink that no other language edition has, showing how much of the knowledge is spread across the globe, and not available across language borders. This includes two of our focus languages, Bengali with six foods and Malayalam with eight (query <https://w.wiki/BDdQ>). You may assume that the coverage of food on English Wikipedia would be very strong. However, 12,500 foods and beverages that have articles on Wikipedia are not represented with an article on English Wikipedia (query <https://w.wiki/BDdT>)–*i.e.*, close to half of the foods that have an article on Wikipedia are not represented in English. All of our five focus languages have articles about food which are not described on English Wikipedia (query <https://w.wiki/BDdb>). And 225 language editions have articles on foods and beverages that English Wikipedia does not cover (query <https://w.wiki/BDdu>). One note is that some of these gaps and missing articles might be due to different ways in which foods and beverages are split into articles in different languages: in one language we might have six articles about six different types of a regional dish, which is covered by a single article in a different language. But these are indeed some of the differences we are curious to explore and uncover with Abstract Wikipedia over time. I hope that you got a taste of the large variety in how food is being represented on Wikipedia, and how much knowledge we may potentially unlock by allowing everyone, across language barriers, to contribute to this unique and amazing stone soup <https://en.wikipedia.org/wiki/Stone_Soup> that Wikipedia is. Volunteer’s Corner on October 7 Next week, on Monday, October 7th, 2024, at 17:30 UTC <https://zonestamp.toolforge.org/1728322200>, we will have our monthly Volunteers’ Corner. Unless you have many questions, we will follow our usual agenda, of giving updates on the upcoming plans and recent activities, having plenty of time and space for your questions, and building a Function together. Looking forward to seeing you on Monday! Function of the Week: product of list of natural numbers Last week we were talking about multiplying numbers with each other <https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-09-26#…>, and how Wikifunctions beats large language models hands-down in this particular task. This week we follow this direction by picking a function suggested by the community <https://www.wikifunctions.org/wiki/Wikifunctions:Function_of_the_Week#Sugge…>: calculating the product of list (natural number) (Z13558) <https://www.wikifunctions.org/view/en/Z13558>. A product <https://en.wikipedia.org/wiki/Product_(mathematics)> is the result of a multiplication. The product of 2 and 3 is 6, *i.e.* 2 multiplied with 3 is 6. The function we look at this week can deal with an arbitrary number of numbers. How does it do that? We can’t add and remove arguments in Wikifunctions! The trick is that it actually doesn’t take an arbitrary number of arguments, but it takes a single argument: a list. To be more precise, a typed list, a list of natural numbers. This means that when you get to the function page, the *Try this function* section looks a bit funny: instead of giving you a field or several fields to enter a value, it just shows the name of the argument (List of natural numbers), and a big + button. Now you have to click on the big + button. Once you do that, you get the opportunity to enter a number. If you want to add another number, just click again on the + button. If you want to remove a number from the list, you can click on the three dots next to the text Item, followed by the number, and then choose the “Delete item” option. By the way, some folks call the three dots the meatballs menu icon. Play around a bit with this feature, to enter more elements to a list and remove them. It’s a good skill to have, because all lists in Wikifunctions work with this flow. This function has five tests and eight implementations. The tests really nicely cover a number of interesting cases: - The product of a single number <https://www.wikifunctions.org/view/en/Z17709> such as 9 is the number itself, *i.e.* 9. - The product of the empty list <https://www.wikifunctions.org/view/en/Z17710>, *i.e.* a list with no numbers in it, is 1. You may ask why is it 1, and not, say, 0? The reason why mathematicians define the product of the empty list to be 1 is because 1 is the identity element of the multiplication operation <https://en.wikipedia.org/wiki/Multiplication#Properties>. The sum of an empty list of numbers <https://www.wikifunctions.org/view/en/Z14038>, by the way, is not 1, but 0 – again, because that’s the identity element of the addition <https://en.wikipedia.org/wiki/Addition#Identity_element> operation. It’s all a bit confusing. But the important thing here is: the tests tell you what to expect in the edge cases, and are a really good form of documentation. - The product of two numbers <https://www.wikifunctions.org/view/en/Z17711> is the result of multiplying the two numbers, i.e. 11×9 is 99. - The next test checks for the product of a somewhat longer list <https://www.wikifunctions.org/view/en/Z13566> of numbers, five numbers: 2, 3, 5, 7, and 11, and results in 2310. - And the final test we currently have checks for the product of all prime numbers below 30 <https://www.wikifunctions.org/view/en/Z18910>: that’s ten numbers, 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. The result is 6,469,693,230. We can see that the first five numbers are the same five numbers in the previous test. We have eight implementations for this function – quite a few! I actually found it quite instructional and interesting to compare the different implementations, both within the same language and across languages. - The first implementation in Python <https://www.wikifunctions.org/view/en/Z13560> starts by setting a variable to 1, and then going through each value in the argument, and then updating the variable to be itself multiplied with the value from the argument, and finally returning the result - The first implementation in JavaScript <https://www.wikifunctions.org/view/en/Z17399> does exactly the same, but it displays some interesting variation in syntax when compared to the Python implementation above - The first composition <https://www.wikifunctions.org/view/en/Z17400> uses the reduce function <https://www.wikifunctions.org/view/en/Z876> on the multiplication function <https://www.wikifunctions.org/view/en/Z13539> and a starting value of 1. The reduce function is the second half of the famous MapReduce <https://en.wikipedia.org/wiki/MapReduce> programming pattern, which we will devote a future Function of the Week to. - The second composition is recursive <https://www.wikifunctions.org/view/en/Z16828>, *i.e.* it calls itself under certain conditions. If the argument list has one element <https://www.wikifunctions.org/view/en/Z12755>, return that element. If it has none <https://www.wikifunctions.org/view/en/Z813>, return 1. Otherwise multiply <https://www.wikifunctions.org/view/en/Z13539> the first number <https://www.wikifunctions.org/view/en/Z811> of the list with the product of the rest of the list <https://www.wikifunctions.org/view/en/Z812> - and calculating the product of the rest of the list is by calling the product function <https://www.wikifunctions.org/view/en/Z13558> itself again. Because the argument list gets shorter in every call, we know that the recursion will end at some point. - The third composition <https://www.wikifunctions.org/view/en/Z17401> is a variation on the first composition above: it also calls the reduce function <https://www.wikifunctions.org/view/en/Z876>, but instead of using 1 as a starting value, it uses the first number <https://www.wikifunctions.org/view/en/Z811> in the list as the starting value, and then reduces the rest of the list <https://www.wikifunctions.org/view/en/Z812> using multiplication <https://www.wikifunctions.org/view/en/Z13539>. In order to be able to do so it first needs to check whether the list is empty <https://www.wikifunctions.org/view/en/Z813>, in which case it returns 1 directly. - The fourth composition <https://www.wikifunctions.org/view/en/Z13567> (and final one, for now) first checks for the empty list <https://www.wikifunctions.org/view/en/Z813>, in which case it returns 1, and uses the right fold function <https://www.wikifunctions.org/view/en/Z12753> with multiplication <https://www.wikifunctions.org/view/en/Z13539> on the list otherwise. Fold <https://en.wikipedia.org/wiki/Fold_(higher-order_function)> and reduce <https://en.wikipedia.org/wiki/Reduction_operator> are two very similar functions, sometimes even used synonymously. In Wikifunctions, the right fold function is the same as the reduce function with the difference that the reduce function gets a starting value as an argument, whereas right fold starts with the first value of the argument list – which is why it cannot deal with an empty list and requires handling of that beforehand. - The second implementation in Python <https://www.wikifunctions.org/view/en/Z19106> uses Python’s reduce function <https://docs.python.org/3/library/functools.html#functools.reduce> and multiplication operator <https://docs.python.org/3/library/operator.html#operator.mul>, basically the same as the first composition but in Python - The second implementation in JavaScript <https://www.wikifunctions.org/view/en/Z19107> uses JavaScript Array’s reduce method <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Ob…>, but since the language has no built-in multiplication function uses a lambda function <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions…> to express the multiplication We invite you to play around with the function, and particularly the flow for typed lists. And thanks to 99of9 <https://www.wikifunctions.org/wiki/User:99of9> for suggesting the function as a Function of the Week! If you want to make your own suggestions, please feel free to nominate a function <https://www.wikifunctions.org/wiki/Wikifunctions:Function_of_the_Week#Sugge…> yourself.

1 0

2024

2023

2022

2021

2020

Abstract-Wikipedia October 2024