The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-25
--
Our goal for this Quarter: Agreement
Two weeks ago, we previewed the first form of access
<https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11> to
knowledge on Wikidata, and last week we announced that it has gone live
<https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17#…>.
This week we want to sketch out what we are aiming for by the end of the
year.
<https://www.wikifunctions.org/wiki/File:Wikidata_lexemes_logo.svg>
As we have pointed out in the last two weeks, there were a number of issues
we are working on to improve access to Lexemes and other entities on
Wikidata, from issues with the selector for Lexemes
<https://phabricator.wikimedia.org/T377540> (which has already been fixed)
to a better selector and display of Wikidata items (which we are still
working on). Thanks to everybody who has given us feedback and pointed out
further issues, or helped us prioritize the tasks.
But what is the goal for this year? What are we building towards, where do
we want to be by the end of 2024?
The goal is to be able to *build up phrases from Lexemes using linguistic
agreement*. What does that mean?
Many languages require agreement
<https://en.wikipedia.org/wiki/Agreement_(linguistics)> in order to be
correct (some languages do not, such as Japanese, some need a little, such
as English, and some need a lot, such as Swahili). Agreement, or concord,
means that one word or phrase has to change in order to fit another word or
phrase in a given sentence. Let’s take a look at an English example:
*“Laura ate an apple.”* vs *“Laura ate two apples.”*
In the first sentence, the word *“an”* requires to be followed by the
singular, whereas in the second sentence the word *“two”* requires to be
followed by the plural. So the first sentence has the word form *“apple”*,
and the second sentence the word form *“apples”*.
Many languages such as Italian, Hindi, or Ukrainian have grammatical
genders for nouns, such as for their respective words for cat: in Italian,
*gatto* <https://www.wikidata.org/wiki/Lexeme:L5577> is masculine, in
Hindi, बिल्ली <https://www.wikidata.org/wiki/Lexeme:L594403> is feminine,
and so is the Ukrainian кішка <https://www.wikidata.org/wiki/Lexeme:L184954>.
If a noun is being described by an adjective, the adjective in these
languages has to agree with the gender of the noun. So, if we want to
express little cat in Italian, we would say:
*“piccolo gatto”*
Turtle in Italian is *tartaruga*
<https://www.wikidata.org/wiki/Lexeme:L684140>, which is a feminine noun.
If we want to express little turtle in Italian, we would say:
*“piccola tartaruga”*
Note the different ending on the adjective: it is *piccolo* for masculine
nouns, and *piccola* for feminine nouns.
Assume a function that takes two arguments, both Lexemes, one an Italian
noun, the other an Italian adjective. In Italian, the adjective usually
just precedes the noun. But in order to choose the right form, we need to
know the grammatical gender of the noun. In Wikidata, there is a property
for grammatical gender <https://www.wikidata.org/wiki/Property:P5185>.
Before the end of the year, we plan to enable you to run a function in
Wikifunctions on an Italian noun, and get back the value for the
grammatical gender of that noun, if it is given in Italian.
With the value for grammatical gender, you will then be able to filter the
adjective in order to pick the right form. Once we have the right form of
the adjective and the noun, we can concatenate the two with a space in
between, and get a grammatically correct phrase with an adjective and a
noun.
We are looking forward to offering you these capabilities and to see what
you will build with that.
Function of the Week: plural form of lexeme as monolingual text
Since Lexemes are new to Wikifunctions, we will look this week at one of
the brand new community-created functions for Lexemes: plural form of
lexeme as monolingual text
<https://www.wikifunctions.org/view/en/Z19260> (Z19260).
You can go to that function, select a Lexeme, and run the function, and it
will return the first form on that Lexeme that is a plural.
For example, enter the English noun *goose*, and it returns *geese* in
English, enter the Spanish noun *compás* and it returns *compases* in
Spanish. This function should work on every language, and always return a
correct form, as long as it is in Wikidata (and if it is missing in
Wikidata, feel free to enter it).
The function takes one argument of type Wikidata Lexeme
<https://www.wikifunctions.org/view/en/Z6005> and returns a monolingual text
<https://www.wikifunctions.org/view/en/Z11> (that is, a text in a specific
language).
There are two tests written for this function: a plural of *dog* being
*dogs* <https://www.wikifunctions.org/view/en/Z19262>, and a plural of
*amigo* being *amigos* <https://www.wikifunctions.org/view/en/Z19263>. We
have the same issues with tests like last week: the tests depend as much on
Wikidata as they do on Wikifunctions. The second test illustrates that
well: it so happens that on the Lexeme for the Spanish noun *amigo*
<https://www.wikidata.org/wiki/Lexeme:L230374> the form *amigos* is listed
before the form *amigas*, but both of them are correct plural forms, the
former being masculine and the latter feminine. The forms could have been
written the other way around just as well.
The function has one implementation, using a composition
<https://www.wikifunctions.org/view/en/Z19261>. We will read the
composition from the inside to the outside.
1. First, we call select lexeme forms from lexeme
<https://www.wikifunctions.org/view/en/Z19243>, with the lexeme in the
argument and with the plural
<https://www.wikifunctions.org/w/index.php?title=Q146786&action=edit&redlink…>
(in
a list) as the second argument. This call filters the forms of the lexeme,
only leaving the forms which have plural as a grammatical feature.
2. We echo <https://www.wikifunctions.org/view/en/Z801> the result,
which shouldn’t do anything, but fixes issues with typed lists. We hope to
get rid of this step in the future.
3. Then we get the first element
<https://www.wikifunctions.org/view/en/Z811> of the list which has been
returned. This means we are usually getting *a* plural form back, not
*the* plural form: whatever happens to be the first on the Lexeme.
4. At this point we have a Wikidata Lexeme Form
<https://www.wikifunctions.org/view/en/Z6004> at hand. Using value by key
<https://www.wikifunctions.org/view/en/Z803> we ask for the
*representations* of the form, which returns a multilingual text
<https://www.wikifunctions.org/view/en/Z12>.
5. And finally, we can use the function to get the first monolingual
text from a multilingual text
<https://www.wikifunctions.org/view/en/Z19254> in order to get to
the monolingual
text <https://www.wikifunctions.org/view/en/Z11> we are looking for.
Currently, the function fails frequently, due to time outs when resolving
larger objects and evaluating more complex compositions timing out
frequently (for example, it times out on a German noun such as *Baum*
<https://www.wikidata.org/wiki/Lexeme:L11540>). Also, the call to echo
shouldn’t be necessary. We can use this function as a benchmark on
improving the capabilities and robustness of Wikifunctions. And at the same
time, when it works, it demonstrates a really interesting use case.
The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-17
Because of the more complex formatting, the on-wiki version might be easier
to read.
--
What could abstract content look like?
*This week’s newsletter is guest-written by Mahir Morshed
<https://www.wikifunctions.org/wiki/User:Mahir256>.*
The notion of ‘abstract content’ for Abstract Wikipedia arises by analogy
to regular content on regular Wikipedias. This regular content is written
in a specific language’s writing system and, on the surface, is not clearly
connected to the structured information on Wikidata. By contrast, then,
abstract content should not be tied to a specific language’s writing system
and should instead be derived from information on Wikidata. It would
additionally be useful for the parts of this content to have a simplified
syntax, both to reduce the logic needed to process and manipulate this
content and to ensure additions to the content don’t inherently require
changes to the representation format.
It remains then to speak of how this abstract content should appear such
that these desiderata are achieved. Let’s try to arrive at such a
representation through some changes to a Constructor for a simple sentence,
starting with something similarly structured to Figure 1 in Denny’s CACM
paper <https://dl.acm.org/doi/10.1145/3425778>:
Action(
predicate: eating,
eater: Robert J. Jones,
eaten: ice cream,
location: Decatur, Illinois,
time: 1 July 2023, 11:30am
)
The intended meaning of this sentence is “Robert J. Jones ate ice cream in
Decatur, Illinois on July 1st, 2023 at 11:30am.” Right now everything in
the Constructor is in English, and none of the arguments refer to Wikidata
at all. Let’s (mostly) fix the latter of these problems:
Action(
predicate: Q213449 <https://www.wikidata.org/wiki/Q213449>,
eater: Q33103898 <https://www.wikidata.org/wiki/Q33103898>,
eaten: Q13233 <https://www.wikidata.org/wiki/Q13233>,
location: Q506325 <https://www.wikidata.org/wiki/Q506325>,
time: “+2023-07-01T16:30:00Z”
)
This is better, but the name of the Constructor and the names of the
arguments are still in English. What if we used Wikidata items to represent
these as well?
Q4026292 <https://www.wikidata.org/wiki/Q4026292>(
Q179080 <https://www.wikidata.org/wiki/Q179080>: Q213449
<https://www.wikidata.org/wiki/Q213449>,
Q20984678 <https://www.wikidata.org/wiki/Q20984678>: Q33103898
<https://www.wikidata.org/wiki/Q33103898>,
Q2095 <https://www.wikidata.org/wiki/Q2095>: Q13233
<https://www.wikidata.org/wiki/Q13233>,
Q115095765 <https://www.wikidata.org/wiki/Q115095765>: Q506325
<https://www.wikidata.org/wiki/Q506325>,
Q7805404 <https://www.wikidata.org/wiki/Q7805404>: +2023-07-01T16:30:00Z
)
Now that nearly everything in this Constructor is represented by a Wikidata
QID, it can be displayed entirely in a particular language provided that
each item referred to has a label in that language, such as Bengali:
কার্য(
বিধেয়: খাওয়া,
ভোক্তা: রবার্ট জে জোন্স,
খাদ্য: আইসক্রিম,
অবস্থান: ডেকেটার, ইলিনয়,
ঘটনার সময়: +2023-07-01T16:30:00Z
)
We’re still not done, though: could we simplify this syntax a bit? (Can we
get away from needing named arguments to functions?)
Q4026292 <https://www.wikidata.org/wiki/Q4026292>(
Q179080 <https://www.wikidata.org/wiki/Q179080>(Q213449
<https://www.wikidata.org/wiki/Q213449>),
Q20984678 <https://www.wikidata.org/wiki/Q20984678>(Q33103898
<https://www.wikidata.org/wiki/Q33103898>),
Q2095 <https://www.wikidata.org/wiki/Q2095>(Q13233
<https://www.wikidata.org/wiki/Q13233>),
Q115095765 <https://www.wikidata.org/wiki/Q115095765>(Q506325
<https://www.wikidata.org/wiki/Q506325>),
Q7805404 <https://www.wikidata.org/wiki/Q7805404>(+2023-07-01T16:30:00Z)
)
This change, from using named function arguments to using single-member
functions as unnamed arguments, should hopefully remind one of the composition
syntax
<https://www.wikifunctions.org/wiki/Wikifunctions:How_to_create_implementati…>
that
Wikifunctions functions can be implemented in.
Since different predicates require different participant roles–’drinking’
requires ‘drinker’ and ‘drink’, ‘reading’ requires ‘reader’ and ‘thing
being read’, and so on–the number of functions that need to be introduced
at this point will likely skyrocket. We can reduce this number by
generalizing them to use Q613930 <https://www.wikidata.org/wiki/Q613930> to
indicate participant roles, keeping the QIDs we introduced for those roles
as arguments instead:
Q4026292 <https://www.wikidata.org/wiki/Q4026292>(
Q179080 <https://www.wikidata.org/wiki/Q179080>(Q213449
<https://www.wikidata.org/wiki/Q213449>),
Q613930 <https://www.wikidata.org/wiki/Q613930>(Q20984678
<https://www.wikidata.org/wiki/Q20984678>, Q33103898
<https://www.wikidata.org/wiki/Q33103898>),
Q613930 <https://www.wikidata.org/wiki/Q613930>(Q2095
<https://www.wikidata.org/wiki/Q2095>, Q13233
<https://www.wikidata.org/wiki/Q13233>),
Q115095765 <https://www.wikidata.org/wiki/Q115095765>(Q506325
<https://www.wikidata.org/wiki/Q506325>),
Q7805404 <https://www.wikidata.org/wiki/Q7805404>(+2023-07-01T16:30:00Z)
)
The connection to particular programming languages can be made even more
explicit with a little rearrangement:
(“Q4026292”
(“Q179080” “Q213449”)
(“Q613930” “Q20984678” “Q33103898”)
(“Q613930” “Q2095” “Q13233”)
(“Q115095765” “Q506325”)
(“Q7805404” “+2023-07-01T16:30:00Z”)
)
This format, borrowing from the syntax of Lisp
<https://en.wikipedia.org/wiki/Lisp>-like programming languages, is what I
believe should be used to store abstract content for Abstract Wikipedia. As
a purely optional last measure for completeness, let’s try to turn the
timestamp into QIDs, using items for the date, time, and time zone:
(“Q4026292”
(“Q179080” “Q213449”)
(“Q613930” “Q20984678” “Q33103898”)
(“Q613930” “Q2095” “Q13233”)
(“Q115095765” “Q506325”)
(“Q7805404” (“Q186885” “Q69306847” “Q95056915” “Q15406405”))
)
Since this final result is composed entirely of strings (if the “Q” is
removed everywhere, integers?) and lists–both more primitive data
structures across lots of environments–it can be read and modified the way
other lists of strings are dealt with in those environments. (In fact,
lists of strings can be used as the input to Wikifunctions functions
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2024-01-03>,
even though actual handling of Wikidata items is still to come.) As a
reminder, since each string is a Wikidata QID, this final result can be
displayed in a given language provided each item has a label in that
language.
The Constructor whose written form we have been modifying also represents
what I believe to be a very useful building block for abstract content. In
many languages this would correspond to a structurally more simple
sentence–albeit one whose main verb isn’t something like ‘to be’ or ‘to
have’–complete with a predicate (‘eating’), participant roles (such as
‘eater’ and ‘food’), and any number of modifiers (such as ‘location’ and
‘time’). There are already lots of Wikidata items for predicates, with
Wikidata verb and verb phrase lexemes linking to them
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Statistics/indi…>,
and there is an emerging effort to introduce items to represent participant
roles for predicates
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Events_and_Role_Frames>.
In principle, the order of components within such a block would not be
significant, so that the following would be functionally identical to what
was shown above:
(“Q4026292”
(“Q115095765” “Q506325”)
(“Q179080” “Q213449”)
(“Q7805404” (“Q186885” “Q69306847” “Q95056915” “Q15406405”))
(“Q613930” “Q2095” “Q13233”)
(“Q613930” “Q20984678” “Q33103898”)
)
Putting these blocks together requires introducing some machinery, but with
the representation we arrived at it is possible to make this machinery
realizable. The following are but three possible examples:
- Two simple sentences can be coordinated (e.g. using ‘and’, ‘or’,
‘but’, and so on) by adding both as arguments to a new list. The item
Q13381767 <https://www.wikidata.org/wiki/Q13381767> below, for example,
represents a simple ‘and’ relationship:
(“Q13381767”
(“Q4026292” (“Q179080” “Q213449”) [...])
(“Q4026292” (“Q179080” “Q199657”) [...])
)
- A simple sentence may be subordinated to another (e.g. using
‘because’, ‘when’, ‘while’, and so on) by introducing a modifier wrapping
that simple sentence and using that modifier in the other sentence. The
item Q12774849 <https://www.wikidata.org/wiki/Q12774849> below, for
example, represents a simple ‘because’ relationship:
(“Q4026292” (“Q179080” “Q213449”) [...]
(“Q12774849”
(“Q4026292” (“Q179080” “Q199657”) [...])
)
)
- Arbitrary modifiers could be applied after a simple sentence has been
formed by wrapping them around that sentence. The item Q1478451
<https://www.wikidata.org/wiki/Q1478451> below, for example, represents
simple negation:
(“Q1478451”
(“Q4026292” (“Q179080” “Q199657”) [...])
)
Much, if not all, of what has been described above has been put into
practice at elemwala.toolforge.org (powered by Ninai
<https://gitlab.com/mahir256/ninai/>/Udiron
<https://gitlab.com/mahir256/udiron/>).
*This week’s newsletter is guest-written by Mahir Morshed
<https://www.wikifunctions.org/wiki/User:Mahir256>. If you want to propose
a guest-written newsletter, please contact Luca
<https://www.wikifunctions.org/wiki/User_talk:Sannita_(WMF)> or Denny
<https://www.wikifunctions.org/wiki/User_talk:DVrandecic_(WMF)>.*
Recent Changes in the software
A very light set of technical changes this week, as our focus was on the
longer-term Quarterly work which is still in-flight.
On the front-end side, we made some follow-up fixes to the UX components
for using Lexemes (T373589 <https://phabricator.wikimedia.org/T373589>),
allowing you to search for single-glyph Lexemes (like '𒂼', which is L1
<https://www.wikidata.org/wiki/Lexeme:L1>) and tweaking the visual display.
We also improved the request traceability headers we generate when you run
a function, consolidating on the OpenTelemetry standard ones as part of
wider Wikimedia observability work (T375922
<https://phabricator.wikimedia.org/T375922>).
Function of the Week: select representation from lexeme
As we wrote last week
<https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11>,
we are introducing Wikidata lexemes and first versions of other
Wikidata-based types. The new types are now available, and in order to
demonstrate the new types and how they work, we have created a first set of
functions:
1. count lexeme forms in lexeme
<https://www.wikifunctions.org/view/en/Z19232>
2. count matching lexeme forms in lexeme
<https://www.wikifunctions.org/view/en/Z19234>
3. select representation from lexeme
<https://www.wikifunctions.org/view/en/Z19241>
4. select matching lexeme forms in lexeme
<https://www.wikifunctions.org/view/en/Z19243>
All of these functions use the new Wikidata lexeme
<https://www.wikifunctions.org/view/en/Z6005> type for their first
argument. When you go to one of these functions, our UI provides a lexeme
selector that helps you to pick a lexeme from Wikidata that matches the
word that you type. After hitting run, your selected lexeme is retrieved
from Wikidata and transformed into our Wikidata lexeme type (by a
preparatory call to the new builtin fetch Wikidata lexeme
<https://www.wikifunctions.org/view/en/Z6825> function) and then passed
into the selected function above.
Let’s take a closer look at one of these new functions: select
representation from lexeme <https://www.wikifunctions.org/view/en/Z19241>.
That function also has a second argument, grammatical features, which is a
list <https://www.wikifunctions.org/view/en/Z881> of Wikidata item
references <https://www.wikifunctions.org/view/en/Z6091>. Currently, we
don't have a UI component for selecting Wikidata items yet, but that is
part of our upcoming work in this quarter. However, you can copy and paste
a QID for grammatical features from Wikidata. When you specify one or more
grammatical features, those are used to select the lexeme form(s) from the
lexeme which have those grammatical features.
Let’s take a look at a simple example: we want to obtain the (first) plural
form of the English noun "goose"
<https://www.wikidata.org/wiki/Lexeme:L6424>. We type "goose" in the Lexeme
selector, and click on the "English, noun" choice. In the second argument,
we click on the "+" button and type in Q146786, the QID for plural
<https://www.wikidata.org/wiki/Q146786>. Then we click “Run function” and
we should get back the plural form.
That is also the first test <https://www.wikifunctions.org/view/en/Z19258> for
the function. A second test
<https://www.wikifunctions.org/view/en/Z19259> checks
that the plural <https://www.wikidata.org/wiki/Q146786> nominative
<https://www.wikidata.org/wiki/Q131105> of the Malayalam word ആപ്പിൾ
<https://www.wikidata.org/wiki/Lexeme:L455955> (with one meaning being
apple) is ആപ്പിളുകൾ. This test is to check a different script and a more
complex lexeme.
In general, it can be difficult to write tests for some of these functions,
as they rely on a certain stability of Wikidata, and when writing tests we
should make a thoughtful decision about what exactly we are testing with a
given test.
The function currently has one implementation
<https://www.wikifunctions.org/view/en/Z19242> written in JavaScript. The
implementation can be inspected and used as a pattern for other
implementations. But this function is implemented entirely in the
contributor space (unlike the fetch Wikidata lexeme
<https://www.wikifunctions.org/view/en/Z6825> function, which has a magical
builtin implementation <https://www.wikifunctions.org/view/en/Z6925> and
certainly does things that contributors cannot do).
Here is another example on how to use these new functions: if you want to
examine the lexeme forms from a lexeme, use select matching lexeme forms in
lexeme <https://www.wikifunctions.org/view/en/Z19243>. Type some word into
the Lexeme selector and choose one of the options it offers. If you now
leave the second argument as the empty list, you will get back all of the
Lexeme forms from the selected Lexeme. Then you can browse them in
WIkifunctions
Note that we currently have a few bugs: If there are two or more choices
displayed with the exact same word form, the first of them will always be
selected, no matter which one you click on. Also, larger Lexemes cause a
gateway timeout on loading. And, just with selecting QIDs, we also don’t
have a proper display for QIDs yet. If you encounter further issues, please
let us know.
The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-11
--
Wikidata Lexemes in Wikifunctions are coming soon!
Wikidata famously contains a large knowledge graph about more than a
hundred million items, but it also has a younger, less known side:
lexicographical
data <https://www.wikidata.org/wiki/Wikidata:Lexicographical_data>.
Currently, Wikidata describes more than 1.3 Million lexical items across
1291 languages. The lexicographic data in Wikidata is an essential
ingredient for the Abstract Wikipedia vision.
As an early step on this road, support for Lexemes is coming to
Wikifunctions very soon! And we want to give a small preview of that.
We are introducing a number of new Types, and each of those will come in
two flavors. Let us look at Wikidata Lexemes
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation#D…>
for
an example. We have introduced two Types to handle them: the Wikidata Lexeme
<https://www.wikifunctions.org/view/en/Z6005> itself, and the Wikidata
Lexeme Reference <https://www.wikifunctions.org/view/en/Z6095>. The
reference is a wrapper around the Lexeme ID
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation#L…>.
A Function will be provided that takes a Wikidata Lexeme Reference and
returns the respective Wikidata Lexeme.
An instance of the Wikidata Lexeme Type represents a Lexeme from Wikidata.
This means that we will not be able to create Lexemes in Wikifunctions on
the fly, or modify them: if you want to change or create a Lexeme, you will
continue to do so in Wikidata.
We have extended the Wikifunctions user interface to work with the new
Types. For Lexemes, there is a built-in search interface that will allow
you to search for and select a Lexeme in order to use it as an argument in
a function call.
There will be numerous limitations initially; particularly, Statements will
be very incomplete. Any statement that has a Type that is not supported
(which for now is almost all of them) will be silently dropped. We will,
over time, increase the covered Value Types from Wikidata, with the
eventual goal to represent Wikidata fully.
One very important restriction is that you won’t be initially able to
select Wikidata entities through incoming Statements. This is a very
notable restriction: it does not allow us to take, e.g. the item for dog
<https://www.wikidata.org/wiki/Q144> as an argument and then, using a
function, follow the *item for this sense*
<https://www.wikidata.org/wiki/Property:P5137> statement on the first sense
<https://www.wikidata.org/wiki/Lexeme:L1122#S1> of the Lexeme *dog*
<https://www.wikidata.org/wiki/Lexeme:L1122>, in order to pick the relevant
Lexeme. As this is a very important use case for the Abstract Wikipedia
story, we will be working on resolving this swiftly.
Lexeme access will be a major new capability with many moving parts, and
there is a good chance that we will need more documentation, that some
workflows will initially be unclear, and also that some things might be
broken at the beginning. We ask for your patience with us so we can improve
it, but we also will ask for your feedback so we know what to improve.
We are excited to get this launched!
Recent Changes in the software
Most of our work over the past two weeks has been on the new Quarterly
work, including the Wikidata access discussed above, and on "Fix It" work
to pay down our technical debt. We also landed a few fixes this week:
We've adjusted the code that picks your interface language when clicking
links on Wikifunctions to also respect your account language preference, if
set (T374309 <https://phabricator.wikimedia.org/T374309>). Thanks to
User:Ameisenigel <https://www.wikifunctions.org/wiki/User:Ameisenigel> for
finding and reporting the issue!
As part of wider work to remove raw HTML interface messages across
MediaWiki, we replaced the site copyright message written by Legal that
appears in the footer with ones that are in wikitext (T375882
<https://phabricator.wikimedia.org/T375882>).
We've re-written the build process for our back-end evaluator service to be
simpler and faster through Docker layer caching, and by loosening the load
stress-test job (T376053 <https://phabricator.wikimedia.org/T376053>).
We've dropped an unused method for loading HTML content from a wiki that we
inherited from the "service-template-node" template that was causing
confusion (T366733 <https://phabricator.wikimedia.org/T366733>). We've
improved the way we include our utilities in the back-end for less code
duplication (T347086 <https://phabricator.wikimedia.org/T347086>). We have
added some better metrics and logging for our monitoring of the back-end
services (T376225 <https://phabricator.wikimedia.org/T376225>, T375457
<https://phabricator.wikimedia.org/T375457>).
We, along with all Wikimedia-deployed code, are now using the latest
version of the Codex UX library, v1.13.1, as of this week. We believe that
there should be no user-visible changes on Wikifunctions, so please comment
on the Project chat or file a Phabricator task if you spot an issue.
Recording of Volunteers’ Corner
<https://www.wikifunctions.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
The recording of this month’s Volunteers' Corner is now available on Commons
<https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
.
Function of the Week: English plural possessive
Given that we are getting close on supporting Lexemes, this week’s Function
of the Week will be about the plural possessive in English. It is also the
Function we have built together in this week’s Volunteer’s Corner. So if
you want to see that Function being created, there’s a video on Commons
<https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
!
In English, nouns usually have a singular and a plural form. The singular
is used when we talk about one instance of the noun, and the plural when we
talk about multiples. The possessive
<https://en.wikipedia.org/wiki/English_possessive> is used when we want to
express that there’s something that belongs to it. So we may say there is *one
dog*, there are *two dogs*, and this is *the dog’s house*. The singular is
*dog*, the plural is *dogs*, and the singular possessive is *dog’s*.
Combining these, we get the plural possessive: *the dogs’ barking* would
refer to barking done by several dogs.
Of the 30,599 English nouns in Wikidata, the vast majority (28,038) have
two forms, but five Lexemes also feature possessive or genitive forms. They
are rarely specifically listed (e.g. on sport
<https://www.wikidata.org/wiki/Lexeme:L301>), because they are almost
always regular, given the singular and plural forms.
Regular forms are great for functions! The English plural possessive
<https://www.wikifunctions.org/view/en/Z19125> function takes the lemma,
i.e. the singular form, and returns the plural possessive. There is one
implementation <https://www.wikifunctions.org/view/en/Z19129>, which is a
composition: it first generates the plural
<https://www.wikifunctions.org/view/en/Z11089> out of the lemma, and
then creates
the possessive <https://www.wikifunctions.org/view/en/Z11302> out of the
plural.
I think this Function is a good example of a Function that probably doesn’t
need any further implementations: the only other way to implement it would
be to redo the two functions that are used in the composition, and there
seems no benefit in that.
The Function has five tests, of which three are connected – the other two
are left for discussion, whether they should be connected or not:
- volunteer to volunteers’
<https://www.wikifunctions.org/view/en/Z19128> (in
order to honor the Volunteers’ Corner)
- kiss to kisses’ <https://www.wikifunctions.org/view/en/Z19131> (a
slightly more complex pluralization)
- dog to dogs’ <https://www.wikifunctions.org/view/en/Z19126>
- fish to fish’s <https://www.wikifunctions.org/view/en/Z19127> (not
connected, and it fails with the current implementation)
- Matrix to Matrices’ <https://www.wikifunctions.org/view/en/Z19130> (not
connected either)
One main point is to decide whether this Function should always return the
correct plural possessive, in which case the unconnected tests should be
connected, or just regular plural possessives – in which case these tests
shouldn’t be there.
As always, this was a fun exercise, and I want to thank the Volunteers who
showed up and helped us in building the Function.
Lojban is a very precise language. It can be used as an intermediate
language in machine translation tools.
for example
English <-> Lojban <-> Polish
manual : https://lojban.org/publications/cll/cll_v1.1_xhtml-chapter-chunks/
has corpora https://corpus.lojban.org/ but need large parallel corpora
It will be good if will possible integrate Abstract WIkipedia with
Lojban as any more language,
The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-10-02
--
Focus topic: food
<https://www.wikifunctions.org/wiki/File:Christmas_table_(Serbian_cuisine).j…>
As we discussed two weeks ago, we are introducing two focus topics
<https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-09-20>.
One focus topic will concern model articles
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-07>,
and one will be for bespoke articles
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-06-21>. We
are looking for your input around a focus topic for model articles, but
this time we want to discuss our chosen focus topic for bespoke articles:
*food*.
Why food? Articles about foods and beverages on the different language
editions of Wikipedia have an enormous variety of representation. Some
articles talk about culture, others about history; some articles talk about
nutrition, others about preparation.
<https://www.wikifunctions.org/wiki/File:Egyptian_food_Koshary.jpg>
Some have infoboxes; many do not. One might think that foods can easily
contain infoboxes about nutritional values, but many foods are prepared in
such different ways–and exhibit so many different varieties–that it's
difficult to express structured data about the food, such as nutritional
values. In short, creating a template for articles about food and beverages
is basically impossible.
Foods and beverages also have the interesting quality that, perhaps more
than any other topic area, they exhibit cultural differences in the
different language articles of Wikipedia. The few articles we checked and
compared across languages had sometimes vastly different content. And
whereas there are many well-known (and probably even more lesser-known)
contentious questions about food, the debates in this area are in general
less heated than those on topics such as politics, geography, history, or
religion.
<https://www.wikifunctions.org/wiki/File:Goblet_of_Fire_Cocktail.jpg>
Wikidata has items on about 31,000 foods and beverages, of which 26,000
have sitelinks. There are eleven topics with more than 200 sitelinks,
giving an indication of their global importance: coffee (243), milk (242),
tea (240), bread (239), food (228), beer (221), apple (220), wine (216),
rice (213), banana (209), and honey (203) (query <https://w.wiki/BDb7>).
The chart shows how many foods have how many sitelinks (note the log scale
on the y axis).
[image: A chart showing the number of foods with a certain number of
sitelinks on Wikimedia projects as of September 2024]
<https://www.wikifunctions.org/wiki/File:Log_Number_of_foods_vs._Sitelinks.s…>
326 language editions of Wikipedia have articles about foods and beverages (
query <https://w.wiki/BDcr>), showing the universal importance of food. 114
languages have an article about a dish or drink that no other language
edition has, showing how much of the knowledge is spread across the globe,
and not available across language borders. This includes two of our focus
languages, Bengali with six foods and Malayalam with eight (query
<https://w.wiki/BDdQ>).
You may assume that the coverage of food on English Wikipedia would be
very strong. However, 12,500 foods and beverages that have articles on
Wikipedia are not represented with an article on English Wikipedia (query
<https://w.wiki/BDdT>)–*i.e.*, close to half of the foods that have an
article on Wikipedia are not represented in English. All of our five focus
languages have articles about food which are not described on English
Wikipedia (query <https://w.wiki/BDdb>). And 225 language editions have
articles on foods and beverages that English Wikipedia does not cover (query
<https://w.wiki/BDdu>).
One note is that some of these gaps and missing articles might be due to
different ways in which foods and beverages are split into articles in
different languages: in one language we might have six articles about six
different types of a regional dish, which is covered by a single article in
a different language. But these are indeed some of the differences we are
curious to explore and uncover with Abstract Wikipedia over time.
I hope that you got a taste of the large variety in how food is being
represented on Wikipedia, and how much knowledge we may potentially unlock
by allowing everyone, across language barriers, to contribute to this
unique and amazing stone soup <https://en.wikipedia.org/wiki/Stone_Soup> that
Wikipedia is.
Volunteer’s Corner on October 7
Next week, on Monday, October 7th, 2024, at 17:30 UTC
<https://zonestamp.toolforge.org/1728322200>, we will have our monthly
Volunteers’ Corner. Unless you have many questions, we will follow our
usual agenda, of giving updates on the upcoming plans and recent
activities, having plenty of time and space for your questions, and
building a Function together. Looking forward to seeing you on Monday!
Function of the Week: product of list of natural numbers
Last week we were talking about multiplying numbers with each other
<https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-09-26#…>,
and how Wikifunctions beats large language models hands-down in this
particular task. This week we follow this direction by picking a
function suggested
by the community
<https://www.wikifunctions.org/wiki/Wikifunctions:Function_of_the_Week#Sugge…>:
calculating the product of list (natural number) (Z13558)
<https://www.wikifunctions.org/view/en/Z13558>.
A product <https://en.wikipedia.org/wiki/Product_(mathematics)> is the
result of a multiplication. The product of 2 and 3 is 6, *i.e.* 2
multiplied with 3 is 6. The function we look at this week can deal with an
arbitrary number of numbers. How does it do that? We can’t add and remove
arguments in Wikifunctions!
The trick is that it actually doesn’t take an arbitrary number of
arguments, but it takes a single argument: a list. To be more precise, a
typed list, a list of natural numbers. This means that when you get to the
function page, the *Try this function* section looks a bit funny: instead
of giving you a field or several fields to enter a value, it just shows the
name of the argument (List of natural numbers), and a big + button.
Now you have to click on the big + button. Once you do that, you get the
opportunity to enter a number. If you want to add another number, just
click again on the + button. If you want to remove a number from the list,
you can click on the three dots next to the text Item, followed by the
number, and then choose the “Delete item” option. By the way, some folks
call the three dots the meatballs menu icon.
Play around a bit with this feature, to enter more elements to a list and
remove them. It’s a good skill to have, because all lists in Wikifunctions
work with this flow.
This function has five tests and eight implementations. The tests really
nicely cover a number of interesting cases:
- The product of a single number
<https://www.wikifunctions.org/view/en/Z17709> such as 9 is the number
itself, *i.e.* 9.
- The product of the empty list
<https://www.wikifunctions.org/view/en/Z17710>, *i.e.* a list with no
numbers in it, is 1. You may ask why is it 1, and not, say, 0? The reason
why mathematicians define the product of the empty list to be 1 is because
1 is the identity element of the multiplication operation
<https://en.wikipedia.org/wiki/Multiplication#Properties>. The sum of an
empty list of numbers <https://www.wikifunctions.org/view/en/Z14038>, by
the way, is not 1, but 0 – again, because that’s the identity element of
the addition
<https://en.wikipedia.org/wiki/Addition#Identity_element> operation.
It’s all a bit confusing. But the important thing here is: the tests tell
you what to expect in the edge cases, and are a really good form of
documentation.
- The product of two numbers
<https://www.wikifunctions.org/view/en/Z17711> is the result of
multiplying the two numbers, i.e. 11×9 is 99.
- The next test checks for the product of a somewhat longer list
<https://www.wikifunctions.org/view/en/Z13566> of numbers, five numbers:
2, 3, 5, 7, and 11, and results in 2310.
- And the final test we currently have checks for the product of all
prime numbers below 30 <https://www.wikifunctions.org/view/en/Z18910>:
that’s ten numbers, 2, 3, 5, 7, 11, 13, 17, 19, 23, and 29. The result is
6,469,693,230. We can see that the first five numbers are the same five
numbers in the previous test.
We have eight implementations for this function – quite a few! I actually
found it quite instructional and interesting to compare the different
implementations, both within the same language and across languages.
- The first implementation in Python
<https://www.wikifunctions.org/view/en/Z13560> starts by setting a
variable to 1, and then going through each value in the argument, and then
updating the variable to be itself multiplied with the value from the
argument, and finally returning the result
- The first implementation in JavaScript
<https://www.wikifunctions.org/view/en/Z17399> does exactly the same,
but it displays some interesting variation in syntax when compared to the
Python implementation above
- The first composition <https://www.wikifunctions.org/view/en/Z17400> uses
the reduce function <https://www.wikifunctions.org/view/en/Z876> on
the multiplication
function <https://www.wikifunctions.org/view/en/Z13539> and a starting
value of 1. The reduce function is the second half of the famous
MapReduce <https://en.wikipedia.org/wiki/MapReduce> programming pattern,
which we will devote a future Function of the Week to.
- The second composition is recursive
<https://www.wikifunctions.org/view/en/Z16828>, *i.e.* it calls itself
under certain conditions. If the argument list has one element
<https://www.wikifunctions.org/view/en/Z12755>, return that element. If it
has none <https://www.wikifunctions.org/view/en/Z813>, return 1.
Otherwise multiply <https://www.wikifunctions.org/view/en/Z13539> the first
number <https://www.wikifunctions.org/view/en/Z811> of the list with the
product of the rest of the list
<https://www.wikifunctions.org/view/en/Z812> - and calculating the
product of the rest of the list is by calling the product function
<https://www.wikifunctions.org/view/en/Z13558> itself again. Because the
argument list gets shorter in every call, we know that the recursion will
end at some point.
- The third composition <https://www.wikifunctions.org/view/en/Z17401> is
a variation on the first composition above: it also calls the reduce
function <https://www.wikifunctions.org/view/en/Z876>, but instead of
using 1 as a starting value, it uses the first number
<https://www.wikifunctions.org/view/en/Z811> in the list as the starting
value, and then reduces the rest of the list
<https://www.wikifunctions.org/view/en/Z812> using multiplication
<https://www.wikifunctions.org/view/en/Z13539>. In order to be able to
do so it first needs to check whether the list is empty
<https://www.wikifunctions.org/view/en/Z813>, in which case it returns 1
directly.
- The fourth composition <https://www.wikifunctions.org/view/en/Z13567> (and
final one, for now) first checks for the empty list
<https://www.wikifunctions.org/view/en/Z813>, in which case it returns
1, and uses the right fold function
<https://www.wikifunctions.org/view/en/Z12753> with multiplication
<https://www.wikifunctions.org/view/en/Z13539> on the list otherwise.
Fold <https://en.wikipedia.org/wiki/Fold_(higher-order_function)> and
reduce <https://en.wikipedia.org/wiki/Reduction_operator> are two very
similar functions, sometimes even used synonymously. In Wikifunctions, the
right fold function is the same as the reduce function with the difference
that the reduce function gets a starting value as an argument, whereas
right fold starts with the first value of the argument list – which is why
it cannot deal with an empty list and requires handling of that beforehand.
- The second implementation in Python
<https://www.wikifunctions.org/view/en/Z19106> uses Python’s reduce
function
<https://docs.python.org/3/library/functools.html#functools.reduce>
and multiplication
operator <https://docs.python.org/3/library/operator.html#operator.mul>,
basically the same as the first composition but in Python
- The second implementation in JavaScript
<https://www.wikifunctions.org/view/en/Z19107> uses JavaScript Array’s
reduce method
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Ob…>,
but since the language has no built-in multiplication function uses a lambda
function
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions…>
to
express the multiplication
We invite you to play around with the function, and particularly the flow
for typed lists. And thanks to 99of9
<https://www.wikifunctions.org/wiki/User:99of9> for suggesting the function
as a Function of the Week! If you want to make your own suggestions,
please feel
free to nominate a function
<https://www.wikifunctions.org/wiki/Wikifunctions:Function_of_the_Week#Sugge…>
yourself.