The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-11-27
--
WordGraph: (almost) a million forms for describing people
<https://www.wikifunctions.org/wiki/File:Dicti_indent.jpg>
A belated present for Wikidata’s 12th birthday: a team at Google Zurich
released the WordGraph dataset, almost a million word-forms in a structured
representation that is easy to upload to Wikidata. According to its
self-description, *“[t]he WordGraph dataset contains multilingual lexicon
entries linked to Wikipedia entities, focusing on human-denoting nouns and
demonym adjectives. Each lexicon entry contains inflected word-forms and
morphological information for all locales.”*
The dataset contains 968,153 forms in 39 languages. The dataset is
available on GitHub <https://github.com/google-research-datasets/WordGraph> and
published under CC0, making it compatible with Wikidata. We created an
overview with some statistics about the dataset
<https://www.wikidata.org/wiki/Wikidata:WordGraph>, compared with Wikidata.
The senses are already mapped to Wikidata QIDs, and so are the grammatical
features, which makes adding them to Wikidata particularly easy.
With the selection of human-denoting nouns and demonyms, this dataset is
particularly useful for abstract descriptions for people in Wikidata – and
people are, after all, the largest type of items that have Wikipedia
articles. These lexemes will help us with creating such descriptions as
“Irish rugby player”, “Ghanaian singer,” or “Indian mathematician” in many
languages.
We want to thank Bruno Cartoni, Saran Lertpradit, Seungmin Back, Daniel
Calvelo Aros, Kuang-Yu Samuel Chang and Abdelrahman Nabil at Google for
this beautiful gift. We invite everyone to work on enriching Wikidata with
this lexicographic data.
New special page: list of functions filtered by their tests
This week we are happy to introduce a new special page: list of functions
filtered by their tests
<https://www.wikifunctions.org/wiki/Special:ListFunctionsByTests>. The page
allows you to list all functions that have fewer than a certain number of
tests (e.g., fewer than two tests
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=1&stat…>),
or it can help to find functions that have passing tests that are not
connected yet
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=1&max=&stat…>.
Or, on the other side, functions with failing tests that are still connected
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=1&max=&stat…>.
We can look for functions that have no tests at all
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=0&wpFo…>,
or that have no connected tests
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=0&stat…>,
or for functions with more than a dozen tests
<https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=13&max=&wpF…>
.
This special page is expected to be particularly useful for functioneers
looking for tests and implementations to connect.
On the page, you can enter:
- a range of numbers, given as a lower limit and an upper limit (both
inclusive) to limit the number of tests that should match the test
characteristics specified below;
- whether we want to count connected tests or tests not connected yet
(or both, in which case you leave both checkboxes empty); and
- whether we want to count only tests that pass all connected
implementations, or tests that fail for some of the connected
implementations (or both, in which case you leave both checkboxes empty)
Your resulting page can be shared by its URL.
We hope that this new page will be helpful for you to maintain
Wikifunctions!
More statements!
The claims sections of Wikidata lexemes, lexeme forms, and lexeme senses
have received a major upgrade last week. Each claims section contains a
list of Wikidata statements. Previously only statements with String values
were included. This has been expanded to include statements with all the
following types of values:
- String
- Lexeme reference
- Lexeme form reference
- Lexeme sense reference
- Item reference
- Monolingual text
In addition, all statements now include a rank, in addition to their
subject, predicate, and value. Additional details may be found in
Wikifunctions:
Support for Wikidata content
<https://www.wikifunctions.org/wiki/Wikifunctions:Support_for_Wikidata_conte…>
.
In order to do so, we added a new key to the Wikidata statement
<https://www.wikifunctions.org/view/en/Z6003> last week, representing the
rank <https://www.wikifunctions.org/view/en/Z6040>. Big thanks to the
community for organizing a marvelous and diligent clean-up job
<https://www.wikifunctions.org/wiki/Talk:Z6003>!
New type: day of Roman year
This week we introduce a new type: the day of Roman year
<https://www.wikifunctions.org/view/en/Z20342> allows us to specify a
specific day in a year, e.g. November 27, the day when this newsletter is
coming out. A day is represented by a natural number for the day of the
month and a Gregorian month.
We were also planning to release the Gregorian date type
<https://www.wikifunctions.org/view/en/Z20420>. But while implementing the
converters for the type and doing the first function
<https://www.wikifunctions.org/view/en/Z20440> returning the new type, we
noticed that the type felt rather difficult to work with, and community
feedback came up raising concerns. Because of that we marked the type as
“do not use” again and are asking for more feedback and discussion on the type
proposal page
<https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_c…>
.
Gregorian calendar date is represented by a day of the year and a Gregorian
year. This eventually allows us to identify a day according to the
proleptic Gregorian calendar, e.g. 15 January 2001, the day Wikipedia was
founded, or 15 October 1582, the day the Gregorian calendar was introduced.
Note that the Gregorian date type is not yet the same as the point in time
data type in Wikidata <https://www.wikidata.org/wiki/Help:Data_type#time>,
but it is a necessary step on the path to it.
Recent Changes in the software
Last week, we unveiled the new special page, Special:ListMissingLabels
<https://www.wikifunctions.org/wiki/Special:ListMissingLabels>, to find
Functions and other Objects that were missing a label in a language. Today,
we have completed the planned work in this area with
Special:ListFunctionsByTests
<https://www.wikifunctions.org/wiki/Special:ListFunctionsByTests>,
announced above. We hope this page will help the Wikifunctions community
hunt down work that needs to be done more easily (T377909
<https://phabricator.wikimedia.org/T377909> and T377910
<https://phabricator.wikimedia.org/T377910>). We have also changed
Special:ListObjectsByType
<https://www.wikifunctions.org/wiki/Special:ListObjectsByType> to use a
drop-down to select the target Type, to be like the other special pages (
T296315 <https://phabricator.wikimedia.org/T296315>), and to let you sort
the results not just alphabetically but newness, either ascending or
descending (T343633 <https://phabricator.wikimedia.org/T343633>).
We have dropped a large part of the validation code we built that runs
inside the MediaWiki side of the Wikifunctions ecosystem, as it was
complex, buggy — causing at least one partial site outage (T374241
<https://phabricator.wikimedia.org/T374241>) – and slow. The validation of
saved and unsaved Objects will mostly still take place, but in fewer bits
of the code. This should make the site a little faster when you use it, but
more importantly, avoid the risk of crashes (at least, from this area).
We have also tweaked the PHP-side acceptance code to only allow strings as
Z2K1 values, where we were previously lax mostly for testing purposes (
T296724 <https://phabricator.wikimedia.org/T296724>). We don't think this
change should have any user-visible impacts. Finally on the validation side
for this week, we've corrected the PHP code to not try to inspect the
validity of items inside Z99/Quote objects, as they can be invalid, such as
when processing an error complaining that input was invalid (T380386
<https://phabricator.wikimedia.org/T380386>).
Finally, we have added support for the Z1952/bax-bamu
<https://www.wikifunctions.org/view/en/Z1952> (T379870
<https://phabricator.wikimedia.org/T379870>), Z1953/xon
<https://www.wikifunctions.org/view/en/Z1953> (T380246
<https://phabricator.wikimedia.org/T380246>), and Z1954/cdo-hant
<https://www.wikifunctions.org/view/en/Z1954> & Z1955/cdo-latn
<https://www.wikifunctions.org/view/en/Z1955> (T139010
<https://phabricator.wikimedia.org/T139010>, T379829
<https://phabricator.wikimedia.org/T379829>, and T380046
<https://phabricator.wikimedia.org/T380046>) languages to Wikifunctions, as
part of them being added to MediaWiki.
Next volunteers’ corner on December 9
Due to our team offsite next week, we have to move the next volunteers’
corner (and the last one of the year) one week later, to December 9 at
15:30 UTC <https://zonestamp.toolforge.org/1733758200> at the usual place
<https://meet.google.com/xuy-njxh-rkw>. The January volunteers’ corner will
also be moved by a week to January 13.
No update next week
Due to the same team offsite next week, we will also skip next week’s
update. See you again in two weeks!
Function of the week: is leap year
Since it’s Thanksgiving this week in North America, I wanted to give a
thank you to the awesome contributor community we have at Wikifunctions! In
the beginning of this year, I started the “Function of the week” rubric in
this newsletter, and I wanted to highlight some of the great work done by
the community and use it as a vehicle to explain some of the concepts that
Wikifunctions works on.
When the year started, I was genuinely worried whether we would have a
function to present every week. But you exceeded my expectations entirely
and proved my worries wonderfully wrong. Not only was there more than
enough material to present a function of the week, but you have created
more than enough functions to have a function of a day a few times over.
This is utterly amazing, and I want to say thank you, thank you all!
This week we’re coming to a function I have been waiting for a while, and
now that we introduced the Gregorian year
<https://www.wikifunctions.org/view/en/Z20159> type last week, it could
finally be implemented: is leap year
<https://www.wikifunctions.org/view/en/Z20181> (Z20181).
Is leap year takes a single argument, a Gregorian year
<https://www.wikifunctions.org/view/en/Z20159>, and returns a simple Boolean
<https://www.wikifunctions.org/view/en/Z40>: it returns true if the given
year is a leap year, and false otherwise.
Leap years <https://en.wikipedia.org/wiki/Leap_year> were introduced many
years ago, when folks noticed that their calendar years and the seasons and
the skies were not aligning perfectly. In old Rome, a role was introduced,
the *pontifex maximus* <https://en.wikipedia.org/wiki/Pontifex_maximus>,
the chief bridge builder between our world and the world in the heavens,
and, among other things, their job was to keep the human calendar counting
aligned with the actual seasons and other heavenly events. Originally,
the *pontifex
maximus* simply decided, year by year, how long the year should be. Julius
Caesar <https://en.wikipedia.org/wiki/Julius_Caesar> became *pontifex
maximus* in 63 BC, but instead of deciding year by year, he reformed the
calendar and set up predictable rules: every year would have 365 days, but
every fourth year would be a leap year, and that’s 366 days long. This rule
kept going for a few centuries.
Later the role of the *pontifex maximus* was picked up by the Catholic
pope. The calendar was starting to again become out of sync with reality,
and so pope Gregory XIII <https://en.wikipedia.org/wiki/Pope_Gregory_XIII>,
as *pontifex maximus*, issued a bull
<https://en.wikipedia.org/wiki/Inter_gravissimas> introducing the Gregorian
calendar in 1582. The bull had two main effects: first, it dropped ten days
off the calendar, to bring the calendar back in alignment with the seasons,
and second, it modified the rules in order to further reduce the two from
getting out of sync. Every fourth year would still be a leap year, but
there was an exception: every hundredth year, the leap year would be
skipped. But there’s also an exception to that exception: every 400 years
we skip skipping the leap year. So, 1900 had and 2100 will have 365 days,
but 2000 had 366.
Whereas most people are aware of the four-year rule of the Julian calendar,
fewer people know the rules of the Gregorian calendar (given how rarely it
occurs, that’s not exactly a surprise). And so it is unsurprising that
there are many wrong implementations of this function. When searching for
implementations of the leap year rule on GitHub, it is easy to find dozens
of implementations that apply the leap year rule partially or incorrectly.
One more example of why having a large library of functions is a good idea
in general!
The function has a solid set of tests:
- this year, 2024 is a <https://www.wikifunctions.org/view/en/Z20183> leap
year
- next year, 2025, is not <https://www.wikifunctions.org/view/en/Z20254>
- 2000 was a <https://www.wikifunctions.org/view/en/Z20184> leap year,
the last occurrence of the skipping the skipping the leap year rule
- 1900 was not a <https://www.wikifunctions.org/view/en/Z20248> leap
year, the last occurrence of skipping the leap year rule
- 1582 was not <https://www.wikifunctions.org/view/en/Z20256> a leap
year either
- 1 BC was a <https://www.wikifunctions.org/view/en/Z20252> leap year
- 5 BC was a <https://www.wikifunctions.org/view/en/Z20249> leap year,
because it was four years before 1 BC
- 2025 BC was a <https://www.wikifunctions.org/view/en/Z20255> leap
year, too
- 1300 was a Julian leap year, but not one
<https://www.wikifunctions.org/view/en/Z20381> in the proleptic
Gregorian calendar
- 4000 AD will not be <https://www.wikifunctions.org/view/en/Z20382> a
leap year in the Gregorian calendar, but would be in Herschel's proposed
modification
Note that the people living in 2025 BC obviously neither knew that they are
living in 2025 BC nor that they are living in a leap year. That’s the
meaning of proleptic: it is anachronistically applied back in time.
The function has currently the following implementations:
- one in Python <https://www.wikifunctions.org/view/en/Z20182>,
representing the usual rules in a way: if the year number can be divided by
4, but not by 100 but then by 400, then it is a leap year.
- one in JavaScript <https://www.wikifunctions.org/view/en/Z20251>,
which, according to a detailed StackOverflow answer
<https://stackoverflow.com/questions/3220163/how-to-find-leap-year-programma…>,
is the fastest possible check (but probably not in our implementation,
given that we are using BigInt)
- a composition <https://www.wikifunctions.org/view/en/Z20275>
which converts
the year number to the ISO 8601 year
<https://www.wikifunctions.org/view/en/Z20257> (thus turning 1 BC to 0,
2 BC to -1, etc.), and then uses a series of ifs
<https://www.wikifunctions.org/view/en/Z802>: if it is divisible by
<https://www.wikifunctions.org/view/en/Z20266> 400, then true, else if
it is divisible by <https://www.wikifunctions.org/view/en/Z20266> 100,
then false, else whether it is divisible by
<https://www.wikifunctions.org/view/en/Z20266> 4.
- and a quite charming composition
<https://www.wikifunctions.org/view/en/Z20304>, that checks that the day
of the week of the last day of the year
<https://www.wikifunctions.org/view/en/Z20302> is the same as
<https://www.wikifunctions.org/view/en/Z17414> the day of the week
following <https://www.wikifunctions.org/view/en/Z17420> the day of the
week of the first day of the year
<https://www.wikifunctions.org/view/en/Z20290>.
The code implementations benefit from negative years being represented
through an implicit ISO 8601 conversion, and so the usual rules can be
directly applied.
I don’t find it obvious at all that the given implementations would always
have the same result. But given the passing tests, I am quite confident
that they are indeed interchangeable.
The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-11-01
--
Rewriting the backend
<https://www.wikifunctions.org/wiki/File:Rust_programming_language_black_log…>
The Abstract Wikipedia team is working toward a rewrite of our backend
services in a different programming language, likely Rust. Node/JS has
served us well, but we have run up against some limits that would be best
dealt with by switching to a different ecosystem. The immediate work
surrounds how we might better interact with WebAssembly (WASM). Almost
precisely one year ago, we announced
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-25> that
we would begin running Python and JS code in WASM for its sandboxing
characteristics. Since then, we have been interacting with WASM via the
WebAssembly System Interface (WASI), which allows WASM commands selective
access to the underlying operating system.
While WASI has made our code executors more secure, the use of this tool
has caused some bumps. The Node tooling around WASI is not particularly
rich. At the time of adoption, we found that our best option was to use
WASI command-line interfaces. We decided to run these interfaces in
subprocesses. The use of subprocesses has resulted in system stability
issues, mainly related to the impossibility of cleaning up subprocesses
under certain conditions.
The WASI ecosystem in Rust is much more advanced. Several WASI runtimes
provide tools which offer fine levels of control over the binding of WASM
commands to syscalls. With these tools, we can streamline our use of WASI.
We can run our code executors directly inside of the host Rust program,
eliminate subprocesses, and thereby avoid several sources of system
instability. As an added bonus, this level of control means that we can
also improve our sandboxing–a security win–and, in the far (or not so far?)
future, co-opt certain system calls to implement new features for our code
executors.
Upcoming Volunteer’s Corner on November 4
Next week, on Monday, 4 November 2024, at 18:30 UTC
<https://zonestamp.toolforge.org/1730745000>, we will have our monthly
Volunteers’ Corner. Unless you have many questions, we will follow our
usual agenda, of giving updates on the upcoming plans and recent
activities, having plenty of time and space for your questions, and
building a Function together. Looking forward to seeing you on Monday!
Function of the Week: language of lexeme
Function Z19295 is <https://www.wikifunctions.org/view/en/Z19295> a simple
function: it takes a single argument, a lexeme
<https://www.wikifunctions.org/view/en/Z6005>, and returns the (natural)
language <https://www.wikifunctions.org/view/en/Z60> of the lexeme. Each
Lexeme in Wikidata belongs to exactly one language, e.g. if we take a look
at the lexeme L610505 <https://www.wikidata.org/wiki/L610505>,
*“mkpụrụokwu”*, it tells us that it is a word in the Igbo language
<https://en.wikipedia.org/wiki/Igbo_language>.
Accordingly, if we choose the Lexeme *“mkpụrụokwu”* on the function page
for language of lexeme <https://www.wikifunctions.org/view/en/Z19295>, it
returns us the language object, displaying the language code ig (which
stands for Igbo).
The function has one tester <https://www.wikifunctions.org/view/en/Z19297>,
which takes the Lexeme for the English noun dog
<https://www.wikidata.org/wiki/Q1122> and returns the object representing the
English language <https://www.wikifunctions.org/view/en/Z1002>.
There is one implementation <https://www.wikifunctions.org/view/en/Z19296>,
a composition: it uses the value by key function
<https://www.wikifunctions.org/view/en/Z803>, a foundational function to
work with objects in Wikifunctions. Every lexeme
<https://www.wikifunctions.org/view/en/Z6005> consists of seven keys, as
defined on the type page for lexemes
<https://www.wikifunctions.org/view/en/Z6005>. In order to get the value of
a specific key, we can use the value by key function
<https://www.wikifunctions.org/view/en/Z803> with two arguments: the key
<https://www.wikifunctions.org/view/en/Z39> we want to look up, and the
object itself on which we perform the key lookup.
It makes sense for most types to have a function for each key to decompose
objects of that type. Language of Lexeme is such a function: it simply
picks one of the keys of the Lexeme object and returns that.