The on-wiki version of this newsletter edition can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-12-19
--
Function of the Week: age
Last week we introduced the Gregorian date type, and as of the time of
writing, we have 23 functions using the new type. Thanks everyone for your
contributions!
<https://www.wikifunctions.org/wiki/File:Compleanno.JPG>
One of the flagship functions for Wikifunctions that we have mentioned in
presentations and essays before is the age function. This function takes
two dates as the argument and calculates the difference between them. The
first argument could be, for example, the date of birth of a person or the
date an organization was founded. The function would then calculate the age
in full years of the person or organization as of the date given in the
second argument.
For example, Wikipedia was founded on 15 January 2001. At the day of
publication of this newsletter, Wikipedia was 23 years old. The age
function would tell you that natural number as an answer.
Why did we choose it as a flagship function? Because more than 160
Wikimedia projects <https://www.wikidata.org/wiki/Q6236144> have a template
for this functionality, and more than 100 projects
<https://www.wikidata.org/wiki/Q12085840> have a module for this
functionality. But in many cases, these templates and modules are copy and
pasted from another project, underdocumented, not well tested, and almost
never updated if the original has improved. And, as often as these
templates and modules have been copied all around, there are still more
than 500 Wikimedia projects that don’t have access to that functionality.
One goal of Wikifunctions is to provide such functionality from a central
repository: all projects should have automatic access to this
functionality, in its most up-to-date form, well-tested, both through
explicit function tests and through usage across many projects. No more
copy-and-pasting from other wikis, no more content that the local community
barely understands and has difficulty maintaining.
And now, on Wikifunctions, we have the age function (Z20756)
<https://www.wikifunctions.org/view/en/Z20756>. It currently has four
implementations and five tests. Since we have not yet configured a parser
for dates, entering the arguments is a bit of a hassle. Nevertheless, I
chose this function as the last Function of the Week for this year, to use
the opportunity to highlight part of what Wikifunctions may mean for
Wikipedia in the future.
The five tests are:
- someone born on Christmas last year will be 1
<https://www.wikifunctions.org/view/en/Z20796> on Christmas this year
- someone born on Christmas 4 AD will be 2020
<https://www.wikifunctions.org/view/en/Z20797> on Christmas this year
- someone born on Christmas last year will still be 0
<https://www.wikifunctions.org/view/en/Z20798> on Christmas eve this year
- someone born on 23 January 12 BC was 10
<https://www.wikifunctions.org/view/en/Z20799> on 1 March 2 BC
- someone born on 1 January 5 BC was 37
<https://www.wikifunctions.org/view/en/Z20800> on 3 April 33 AD
The tests cover a good range. In particular, the tests across the era
change are important. It would be interesting to have agreements and tests
for what happens when the first date is after the second, and tests for
dates outside of JavaScript’s date range (the far future or past, more than
300,000 years away), but, other than that, the test coverage seems good.
The four current implementations are:
- A composition that counts the days
<https://www.wikifunctions.org/view/en/Z17578> between the two dates,
and divides them by <https://www.wikifunctions.org/view/en/Z13546> 365
(which fails to account for leap years)
- Another composition that subtracts
<https://www.wikifunctions.org/view/en/Z13569> the first year from the
second, and subtracts <https://www.wikifunctions.org/view/en/Z13569> one
more number in case the second date is earlier in the year
<https://www.wikifunctions.org/view/en/Z20406> than the first. It does
so by cleverly using a conversion of the condition to a number
<https://www.wikifunctions.org/view/en/Z17065>. This composition fails
at time of writing, though.
- A third composition <https://www.wikifunctions.org/view/en/Z20764> that
does the same, but using integers instead of natural numbers. This one
works, but could benefit from using a few more high-level functions.
- An implementation in JavaScript
<https://www.wikifunctions.org/view/en/Z20804> that does basically the
same: it subtracts the years, and if the date is later in the year, it
subtracts one more.
*(Note: the first two compositions have been deleted since this text was
written)*
The implementations are interesting (also because of the first one, which
is intended to fail in order to highlight the relevance of some of the test
cases).
Call for Functions: Intros for year articles
The main goal of Wikifunctions is to support Abstract Wikipedia –
meaningful, Wikipedia-style paragraphs and articles generated from data and
abstract content. For that, we need to be able to create high-quality prose
content for articles in many languages.
Many Wikipedias have a set of articles about individual years — for
example, here is the article for the year 2023 in English Wikipedia
<https://en.wikipedia.org/wiki/2023>. In most languages, the article starts
with a few sentences with very similar content to what the English
Wikipedia offers:
“*2023* (MMXXIII <https://en.wikipedia.org/wiki/Roman_numerals>) was a common
year starting on Sunday
<https://en.wikipedia.org/wiki/common_year_starting_on_Sunday> of the Gregorian
calendar <https://en.wikipedia.org/wiki/Gregorian_calendar>, the 2023rd
year of the Common Era <https://en.wikipedia.org/wiki/Commen_Era> (CE)
and *Anno
Domini* <https://en.wikipedia.org/wiki/Anno_Domini> (AD) designations, the
23rd year of the 3rd millennium
<https://en.wikipedia.org/wiki/3rd_millenium> and the 21st century
<https://en.wikipedia.org/wiki/21st_century>, and the 4th year of the 2020s
<https://en.wikipedia.org/wiki/2020s> decade.”
There is now a function that can create a text very similar to this one,
but without the links and formatting: Intro for year in English
<https://www.wikifunctions.org/view/en/Z20597>, which creates the following
text:
“2023 (MMXXIII) was a common year starting on Sunday of the Gregorian
calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD)
designations, the 23rd year of the 3rd millennium, the 23rd year of the
21st century, and the 4th year of the 2020s decade.”
This is currently only available in English, and it has fewer features than
what English Wikipedia uses (for example, it doesn’t switch to the Julian
calendar, it doesn’t unify the counting of the years in a century and a
millennium if it is the same, *etc.*). It would be interesting to create
similar functions for other languages as well, and so we are calling for
functions to be written over the holidays, and will take stock in the
beginning of the Gregorian calendar year 2025.
Another interesting task – and that would be medium term – would be to work
on a function that is abstract, i.e. which creates the right words for the
given languages, rather than the community on Wikifunctions having to
hard-code in each language. This would currently still be difficult, but by
the end of the next quarter we should be able to get “decade” or “century”
from Wikidata in many different languages, which will help us get there.
There are also three main caveats for the current work:
1. The Wikifunctions system is timing out on larger compositions.
Although we have improved the performance of our system, we still are prone
to timeouts in larger compositions.
2. There is a missing feature
<https://phabricator.wikimedia.org/T366459> that
keeps us from getting the label of an object.
3. Admittedly, even if we had the label, that would not be sufficient in
many languages, as we would rather need the Lexeme in order to get the
appropriate inflection.
So as we can see, we are very close to being able to build functions that
can generate texts for years, but there are a few blockers on our way. We
will use these blockers in the coming months to make progress visible and
to focus our development in order to enable these to work – and not only on
Wikifunctions, but on Wikipedia as well.
I hope these thoughts serve both as a reflection on what we achieved this
year, as well as on where we want to go next year.
Recent Changes in the software
This week is the last production release before the end of 2024, as
Wikimedia has a End-of-Year release freeze, so that we don't deploy code
when lots of people are away and unavailable. The next production release
after this will be around 15 January 2025.
We've fixed the database code behind some of our special pages to not list
Objects with talk pages twice – thanks Feeglgeef
<https://www.wikifunctions.org/wiki/User:Feeglgeef> for reporting this (
T381003 <https://phabricator.wikimedia.org/T381003>). We've landed some
preparatory database work that will in future allow us to list Functions
that use particular Types, so you can find examples of how others have used
them; expect this some time next calendar year (T301712
<https://phabricator.wikimedia.org/T301712>).
We've adjusted the logic when loading content from the database so that it
throws a clearer, more MediaWiki-standard error when somehow something
invalid has been saved into the wiki (T381115
<https://phabricator.wikimedia.org/T381115>). We've also added some better
testing for invalid Z2K1 values, stopped the API hiding such invalid items,
and fixed a couple of issues that meant this kind of broken content was
challenging to fix (T381972 <https://phabricator.wikimedia.org/T381972>).
In another area, we've guarded against odd errors triggered from invalid
content when pages are re-rendered, to avoid filling up production logs
with confusing warnings going to the wrong people (T380446
<https://phabricator.wikimedia.org/T380446>).
We've split one re-used i18n message so that it's possible to translate it
properly (T373745 <https://phabricator.wikimedia.org/T373745>), and deleted
two old, now un-used ones to avoid wasted translator effort – sorry for
that!
On the developer side, we've upgraded the version of JSDoc used to generate
our (rather limited) front-end JS docs
<https://doc.wikimedia.org/WikiLambda/master/js/js/index.html>, and the
phan static analyser of our PHP, alongside all Wikimedia repos switching to
the newer versions. We've also made an error from one of our test tools
more clear, as part of preparing for updating our tests to cover a more
modern version of our front-end framework.
We have added support for the Z1956/fvr
<https://www.wikifunctions.org/view/en/Z1956> language to Wikifunctions, as
part of it being added to MediaWiki (T381894
<https://phabricator.wikimedia.org/T381894>).
As always, please alert us if you run into any issues.
News in Types: Double-precision floating-point numbers
This week we are introducing the double-precision floating-point number
<https://www.wikifunctions.org/view/en/Z20838> type, also known as
"float64" among friends, or simply "float". Unlike the other number types
that we already have – natural numbers
<https://www.wikifunctions.org/view/en/Z13518>, integers
<https://www.wikifunctions.org/view/en/Z16683>, and rational numbers
<https://www.wikifunctions.org/view/en/Z19677> – floats are not necessarily
precise. Instead, they are a compromise between precision, feasibility and
efficiency, which has been codified in a standard
<https://en.wikipedia.org/wiki/IEEE_754> almost forty years ago.
The 64 in float64 indicates that a floating-point number needs 64 bits in
most programming languages. This is called the double-precision
floating-point number: single precision takes 32 bits, and half-precision
16. I am tempted to tell you so much more about floating point numbers, and
all the cool features they have, but instead I will just point to the English
Wikipedia article
<https://en.wikipedia.org/wiki/Double-precision_floating-point_format> as a
starting point.
What are floating-point numbers good for in Wikifunctions? We will need to
see how the different number types work out. For many calculations, I
expect us to prefer the precision that rational numbers offer. But the most
pragmatic approach to deal with irrational numbers is to use the
approximation that floating-points offer. Whether we’re dealing with roots,
circles, sinus waves, or logarithms, they will often be difficult or
impossible to calculate with the numbers we already have, and a new type
that balances approximation with precision is now available to deal with
that issue.
One huge advantage of floating-point numbers is that they are standardized
and the standard is widely implemented in hardware and available in many
programming languages.
Floating-point numbers will potentially nudge us to find new patterns in
how to write tests. Often exact precision is counter-productive (one
classical example is that in floating-point arithmetics, 0.1+0.2 is not
equal to 0.3), and for some functions we might want to write tests that
don’t rely on exact equality, but rather on checking that the result is
close enough to the expected value. And that will be a pattern that will be
useful to have for the more complex and interesting types that await us in
the future.
Newsletter taking a break
The next few days, most of the team will take off due to the holiday season
at the end of the year. Expect the next update in the Week of 15 January
2025. The first Volunteers' Corner of the next year will be on 13 December
2025. We wish everyone peaceful days and see you in the new Gregorian
calendar year!
The on-wiki version of this newsletter edition can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-12-12
--
Sketching a path to Abstract Wikipedia
The main goal of Wikifunctions is to support Abstract Wikipedia: a source
of multi-lingual Wikipedia content where we can create and maintain the
content only once, but have it available across many different languages to
fill some of the gaps that currently exist in some Wikipedias.
Today, I would like to sketch out how the natural language generation for
Abstract Wikipedia might develop. As an example goal, let’s take the
following sentence (based on the English Wikipedia article about Waakye
<https://en.wikipedia.org/wiki/Waakye>):
English: *“Waakye is a Ghanaian dish of cooked rice and beans.”*
French: *“Le waakye est un mets ghanéen de riz et de haricots cuits.”*
German: *“Waakye ist ein ghanaisches Gericht aus gekochten Reis und
Bohnen.”*
We look at four stages to work towards this text.
Stage 1: String-based substitution
In Stage 1, we use simple string substitution, in the style of Mad Libs
<https://en.wikipedia.org/wiki/Mad_Libs>. This approach requires the user
to carefully select the right strings, which is quite simple in English,
but gets more complicated in French or German.
So we could have the following function calls:
Instance with origin string-based English(“Waakye”, “dish”, “Ghanaian”)
→ *“Waakye is a Ghanaian dish.”*
Instance with origin string-based French(“Le waakye”, “un mets”, “ghanéen”)
→ *“Le waakye est un mets ghanéen.”*
Instance with origin string-based German(“Waakye”, “ein Gericht”,
“ghanaisches”)
→ *“Waakye ist ein ghanaisches Gericht.”*
This is possible right now. It requires quite detailed grammatical
knowledge by the function caller, as they need to enter the right form
manually. The benefit of this method is difficult to see in this example.
Stage 2: Lexeme-based generation
In Stage 2, instead of using strings, we use Wikidata Lexemes, possible in
the past few months. This allows for a version of the function where the
function caller does not have to worry about agreement and entering the
right form manually, but the function implementer needs to select the right
form from the Lexeme instead. This shifts some of the burden from the
function user to the function author.
This makes the calling much simpler: we don’t have to know whether
*“waakye”* in French will be *“Le waakye”* or *“La waakye”*, we don’t have
to select the agreeing adjective in German (*“ghanaisches Gericht”* or
*“ghanaischer Gericht”*), etc. The correct form will be chosen by the
Function.
Now we would have the following function calls:
Instance with origin Lexeme-based English(Lxxx/Waakye, L3964/dish,
Lxxx/Ghanaian)
→ *“Waakye is a Ghanaian dish.”*
Zxxx/Instance with origin Lexeme-based French(Lxxx/waakye, L24812/mets,
Lxxx/ghanéen)
→ *“Le waakye est un mets ghanéen.”*
Zxxx/Instance with origin Lexeme-based German(Lxxx/Waakye, L500931/Gericht,
Lxxx/ghanaisch)
→ *“Waakye ist ein ghanaisches Gericht.”*
You also will find that a lot of Lexemes are missing for this particular
example, such as the French Lexeme for something from Ghana. We in the
Wikimedia movement need to think about how to approach this gap in what is
– and should be – in Wikidata's Lexemes.
We were hoping that this would be possible right now, and we created a
number of functions during our offsite to test these capabilities.
Unfortunately, we learned that the system is currently failing to evaluate
most such function calls, and accordingly we decided to put a big focus in
the upcoming Quarter on getting these functions to run.
Stage 3: Item-based generation
In the third stage, we would use Wikidata items to help us select Lexemes
from a given language that have comparable meanings. The function caller
does not have to know or look up the right Lexeme in all the languages they
want to generate the text in. They can just put in the relevant Wikidata
items, and the function developer can implement the relevant lookups.
This means that whether or not the function caller knows that the concept
*“dish”* is called *“mets”* in French or *“Gericht”* in German, they will
still be able to create perfectly fluid and correct sentences in those
languages.
This allows us to make the following calls (note that all three calls use *the
same function* here, and the caller does not have to know the languages at
all):
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana,
Z1002/English)
→ *“Waakye is a Ghanaian dish.”*
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana,
Z1004/French)
→ *“Le waakye est un mets ghanéen.”*
Instance with origin(Q14783691/Waakye, Q746549/dish, Q117/Ghana,
Z1002/German)
→ *“Waakye ist ein ghanaisches Gericht.”*
Note that the function will in most cases just route to the language
specific functions developed for the previous stage, but that happens
behind the scenes and transparently for the function caller.
This is currently not possible to implement on Wikifunctions — we still
need to add a function that allows us to find the Lexemes connected to a
given Item. We will work on that in the coming Quarter, and are thankful to
the Search and Wikidata teams for the necessary pre-work they have recently
performed to unlock the possibility.
Stage 4: Item-based content
The final stage we want to discuss today is based on using the knowledge in
Wikidata to create text. We can pull from Wikidata that Q14783691/Waakye
<https://www.wikidata.org/wiki/Q14783691> is a dish from Q117/Ghana
<https://www.wikidata.org/wiki/Q117>, we can look up the ingredients and
their Lexemes, etc. Given the current knowledge about Waakye in Wikidata,
this could then generate the following sentences:
Food with origin and ingredients(Q14783691/Waakye, Z1002/English)
→ *“Waakye is a Ghanaian dish with bean, rice, water, and salt.”*
Food with origin and ingredients(Q14783691/Waakye, Z1002/French)
→ *“Le waakye est un plat ghanéen composé de haricots, de riz, d'eau et de
sel.”*
Food with origin and ingredients(Q14783691/Waakye, Z1002/French)
→ *“Waakye ist ein ghanaisches Gericht aus Bohnen, Reis, Wasser und Salz.”*
This further simplifies writing the function calls: all we need to select
is the dish and the language, and we get a whole sentence that can, in many
cases, make a good opening sentence for the Wikipedia article about the
given dish, or as an entry or short description in various places.
I hope that this gives a good overview of our next few planned steps with
regards to natural language generation, and how Wikifunctions can support
bringing together our different language communities.
Team offsite in Lisbon
<https://www.wikifunctions.org/wiki/File:Abstract_Wikipedia_team_Lisbon_2024…>Abstract
Wikipedia team at the offsite in Lisbon 2024. From left to right, front
row: Cory Massaro, Grace Choi, Genoveva Galarza Heredero, Daphne Smit. Back
row: James Forrester, Denny Vrandečić, David Martin, Sharvani Haran. Not in
picture: Amy Tsay, Amin Al Hazwani, Luca Martinelli, Elena Tonkovidova,
Vaughn Walters.
Last week, the team met for its annual meeting in Lisbon, Portugal. What a
beautiful city! We enjoyed walking through the city, and had very
productive meetings, discussing our plans, team procedures, and using the
time for bonding and social cohesion – very difficult and important to
achieve in a team that is fully remote.
The most tangible outcome is the planning for the next Quarter; we had very
lively discussions to find a consensus, which we still need to write up. We
will report on the plan in one of the next two updates.
New tool for querying Wikifunctions
User:Feeglgeef <https://www.wikifunctions.org/wiki/User:Feeglgeef> created
a new tool that allows you to query Wikifunctions in a very flexible way.
You can search for functions with implementations in Python, types that use
numbers on keys, functions that take three arguments, or return booleans.
The tool is available on Replit (note that this is outside of Wikimedia
servers), and examples and a documentation of the query language are linked
from the front page of the tool: wf-query.replit.app
User:Hogü-456 <https://www.wikifunctions.org/wiki/User:Hog%C3%BC-456> created
an overview of existing tools. If you are aware of more tools, feel free to
add them: Wikifunctions:Tools
<https://www.wikifunctions.org/wiki/Wikifunctions:Tools>
Recent Changes in the software
There's no release of MediaWiki software this week due to the End-of-Year
release freeze, so nothing new to update. As always, please alert us if you
run into any issues.
News in Types: Gregorian calendar date, Byte, Unicode code point
We finally have a Type for Gregorian calendar dates
<https://www.wikifunctions.org/view/en/Z20420>. We have been working a
while towards it, having created a Type for the relevant months
<https://www.wikifunctions.org/view/en/Z16098>, for years
<https://www.wikifunctions.org/view/en/Z20159>, *etc.* The discussion
<https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_c…>
was
lengthy and didn’t lead to a full consensus. A rationale for the decisions
<https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_c…>
on
the design of the Type is provided. We invite you to create functions using
the Type!
This has been by far the most complex Type we are providing so far.
We would like to create Types for other, non-Gregorian calendars, like the
Chinese, Ethiopian, Japanese, Hebrew, and other calendars. If you know any
of these calendars well, please reach out so that we can create the
respective calendars.
In other type related work, proposals for fixing the Byte
<https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Byte> type
and the Unicode code point
<https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Unicode_cod…>
type
(previously character type) have been made. Input is and discussions are
very welcome.
Recordings of December’s Volunteers’ Corner
We had a Volunteers’ Corner this Monday, December 9. It was lively with
many good questions. A recording of the Corner is available on Commons
<https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
.
The function we built together is featured below as the Function of the
Week.
Recording of Denny’s SWIB24 keynote
Denny Vrandečić gave a keynote address at the Semantic Web in Libraries
2024 conference. The topic was on the role of knowledge representations in
a world of large language models. The recording is available on YouTube
<https://www.youtube.com/watch?v=NmCbTOZ4Yos>.
Function of the Week: how many days between two days in the Roman year
The last newsletter introduced the days of the Roman year
<https://www.wikifunctions.org/view/en/Z20342> as a new Type. As of now, we
have 18 new functions using the Type. Also, this week’s Volunteers’ Corner
created such a function, so we will take a look at the resulting function.
How many days are there between two days? Function Z20733
<https://www.wikifunctions.org/view/en/Z20733> can answer that question.
The function has three arguments: the two days
<https://www.wikifunctions.org/view/en/Z20342>, and a Boolean
<https://www.wikifunctions.org/view/en/Z40> which tells us whether the days
are in a leap year or not. It returns a natural number stating how many
days are between the two given days.
It might be easiest to clarify what the function does by looking at the
tests:
- From 1 January to 15 January
<https://www.wikifunctions.org/view/en/Z20735>, that’s 14 days
- From 1 January to 31 December
<https://www.wikifunctions.org/view/en/Z20737>, that’s 364 days in a
common year
- From 28 February to 1 March
<https://www.wikifunctions.org/view/en/Z20736>, it’s one day in a common
year
- But two days <https://www.wikifunctions.org/view/en/Z20734> in a leap
year
The tests are incomplete, with the most notable omission being for any
tests where the first day is after the second, and what that exactly means
with regards to understanding the leap year.
Currently, there is only one implementation for this function so far, which
is partly due to the fact that we didn’t have much time left in the
Volunteers’ Corner, and so we only did one in composition, because we found
that the easiest way to implement the function.
The core of the composition is to turn both days into
<https://www.wikifunctions.org/view/en/Z20357> a number, counting which day
of the year it is (i.e. 1 January is the first day, 2 January the second, 1
February the 32nd, etc.), and then subtract
<https://www.wikifunctions.org/view/en/Z17315> the first number from the
second. The result is then turned from an integer to a natural number
<https://www.wikifunctions.org/view/en/Z17144>, in order to avoid negative
numbers.