The on-wiki version of this newsletter can be found here: https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-11-27 -- WordGraph: (almost) a million forms for describing people https://www.wikifunctions.org/wiki/File:Dicti_indent.jpg
A belated present for Wikidata’s 12th birthday: a team at Google Zurich released the WordGraph dataset, almost a million word-forms in a structured representation that is easy to upload to Wikidata. According to its self-description, *“[t]he WordGraph dataset contains multilingual lexicon entries linked to Wikipedia entities, focusing on human-denoting nouns and demonym adjectives. Each lexicon entry contains inflected word-forms and morphological information for all locales.”*
The dataset contains 968,153 forms in 39 languages. The dataset is available on GitHub https://github.com/google-research-datasets/WordGraph and published under CC0, making it compatible with Wikidata. We created an overview with some statistics about the dataset https://www.wikidata.org/wiki/Wikidata:WordGraph, compared with Wikidata. The senses are already mapped to Wikidata QIDs, and so are the grammatical features, which makes adding them to Wikidata particularly easy.
With the selection of human-denoting nouns and demonyms, this dataset is particularly useful for abstract descriptions for people in Wikidata – and people are, after all, the largest type of items that have Wikipedia articles. These lexemes will help us with creating such descriptions as “Irish rugby player”, “Ghanaian singer,” or “Indian mathematician” in many languages.
We want to thank Bruno Cartoni, Saran Lertpradit, Seungmin Back, Daniel Calvelo Aros, Kuang-Yu Samuel Chang and Abdelrahman Nabil at Google for this beautiful gift. We invite everyone to work on enriching Wikidata with this lexicographic data. New special page: list of functions filtered by their tests
This week we are happy to introduce a new special page: list of functions filtered by their tests https://www.wikifunctions.org/wiki/Special:ListFunctionsByTests. The page allows you to list all functions that have fewer than a certain number of tests (e.g., fewer than two tests https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=1&status%5B%5D=connected&wpFormIdentifier=testfilters), or it can help to find functions that have passing tests that are not connected yet https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=1&max=&status%5B%5D=pending&result%5B%5D=pass&wpFormIdentifier=testfilters. Or, on the other side, functions with failing tests that are still connected https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=1&max=&status%5B%5D=connected&result%5B%5D=fail&wpFormIdentifier=testfilters. We can look for functions that have no tests at all https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=0&wpFormIdentifier=testfilters, or that have no connected tests https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=&max=0&status%5B%5D=connected&wpFormIdentifier=testfilters, or for functions with more than a dozen tests https://wikifunctions.org/wiki/Special:ListFunctionsByTests?min=13&max=&wpFormIdentifier=testfilters .
This special page is expected to be particularly useful for functioneers looking for tests and implementations to connect.
On the page, you can enter:
- a range of numbers, given as a lower limit and an upper limit (both inclusive) to limit the number of tests that should match the test characteristics specified below; - whether we want to count connected tests or tests not connected yet (or both, in which case you leave both checkboxes empty); and - whether we want to count only tests that pass all connected implementations, or tests that fail for some of the connected implementations (or both, in which case you leave both checkboxes empty)
Your resulting page can be shared by its URL.
We hope that this new page will be helpful for you to maintain Wikifunctions! More statements!
The claims sections of Wikidata lexemes, lexeme forms, and lexeme senses have received a major upgrade last week. Each claims section contains a list of Wikidata statements. Previously only statements with String values were included. This has been expanded to include statements with all the following types of values:
- String - Lexeme reference - Lexeme form reference - Lexeme sense reference - Item reference - Monolingual text
In addition, all statements now include a rank, in addition to their subject, predicate, and value. Additional details may be found in Wikifunctions: Support for Wikidata content https://www.wikifunctions.org/wiki/Wikifunctions:Support_for_Wikidata_content .
In order to do so, we added a new key to the Wikidata statement https://www.wikifunctions.org/view/en/Z6003 last week, representing the rank https://www.wikifunctions.org/view/en/Z6040. Big thanks to the community for organizing a marvelous and diligent clean-up job https://www.wikifunctions.org/wiki/Talk:Z6003! New type: day of Roman year
This week we introduce a new type: the day of Roman year https://www.wikifunctions.org/view/en/Z20342 allows us to specify a specific day in a year, e.g. November 27, the day when this newsletter is coming out. A day is represented by a natural number for the day of the month and a Gregorian month.
We were also planning to release the Gregorian date type https://www.wikifunctions.org/view/en/Z20420. But while implementing the converters for the type and doing the first function https://www.wikifunctions.org/view/en/Z20440 returning the new type, we noticed that the type felt rather difficult to work with, and community feedback came up raising concerns. Because of that we marked the type as “do not use” again and are asking for more feedback and discussion on the type proposal page https://www.wikifunctions.org/wiki/Wikifunctions:Type_proposals/Gregorian_calendar_date .
Gregorian calendar date is represented by a day of the year and a Gregorian year. This eventually allows us to identify a day according to the proleptic Gregorian calendar, e.g. 15 January 2001, the day Wikipedia was founded, or 15 October 1582, the day the Gregorian calendar was introduced.
Note that the Gregorian date type is not yet the same as the point in time data type in Wikidata https://www.wikidata.org/wiki/Help:Data_type#time, but it is a necessary step on the path to it. Recent Changes in the software
Last week, we unveiled the new special page, Special:ListMissingLabels https://www.wikifunctions.org/wiki/Special:ListMissingLabels, to find Functions and other Objects that were missing a label in a language. Today, we have completed the planned work in this area with Special:ListFunctionsByTests https://www.wikifunctions.org/wiki/Special:ListFunctionsByTests, announced above. We hope this page will help the Wikifunctions community hunt down work that needs to be done more easily (T377909 https://phabricator.wikimedia.org/T377909 and T377910 https://phabricator.wikimedia.org/T377910). We have also changed Special:ListObjectsByType https://www.wikifunctions.org/wiki/Special:ListObjectsByType to use a drop-down to select the target Type, to be like the other special pages ( T296315 https://phabricator.wikimedia.org/T296315), and to let you sort the results not just alphabetically but newness, either ascending or descending (T343633 https://phabricator.wikimedia.org/T343633).
We have dropped a large part of the validation code we built that runs inside the MediaWiki side of the Wikifunctions ecosystem, as it was complex, buggy — causing at least one partial site outage (T374241 https://phabricator.wikimedia.org/T374241) – and slow. The validation of saved and unsaved Objects will mostly still take place, but in fewer bits of the code. This should make the site a little faster when you use it, but more importantly, avoid the risk of crashes (at least, from this area).
We have also tweaked the PHP-side acceptance code to only allow strings as Z2K1 values, where we were previously lax mostly for testing purposes ( T296724 https://phabricator.wikimedia.org/T296724). We don't think this change should have any user-visible impacts. Finally on the validation side for this week, we've corrected the PHP code to not try to inspect the validity of items inside Z99/Quote objects, as they can be invalid, such as when processing an error complaining that input was invalid (T380386 https://phabricator.wikimedia.org/T380386).
Finally, we have added support for the Z1952/bax-bamu https://www.wikifunctions.org/view/en/Z1952 (T379870 https://phabricator.wikimedia.org/T379870), Z1953/xon https://www.wikifunctions.org/view/en/Z1953 (T380246 https://phabricator.wikimedia.org/T380246), and Z1954/cdo-hant https://www.wikifunctions.org/view/en/Z1954 & Z1955/cdo-latn https://www.wikifunctions.org/view/en/Z1955 (T139010 https://phabricator.wikimedia.org/T139010, T379829 https://phabricator.wikimedia.org/T379829, and T380046 https://phabricator.wikimedia.org/T380046) languages to Wikifunctions, as part of them being added to MediaWiki. Next volunteers’ corner on December 9
Due to our team offsite next week, we have to move the next volunteers’ corner (and the last one of the year) one week later, to December 9 at 15:30 UTC https://zonestamp.toolforge.org/1733758200 at the usual place https://meet.google.com/xuy-njxh-rkw. The January volunteers’ corner will also be moved by a week to January 13. No update next week
Due to the same team offsite next week, we will also skip next week’s update. See you again in two weeks! Function of the week: is leap year
Since it’s Thanksgiving this week in North America, I wanted to give a thank you to the awesome contributor community we have at Wikifunctions! In the beginning of this year, I started the “Function of the week” rubric in this newsletter, and I wanted to highlight some of the great work done by the community and use it as a vehicle to explain some of the concepts that Wikifunctions works on.
When the year started, I was genuinely worried whether we would have a function to present every week. But you exceeded my expectations entirely and proved my worries wonderfully wrong. Not only was there more than enough material to present a function of the week, but you have created more than enough functions to have a function of a day a few times over. This is utterly amazing, and I want to say thank you, thank you all!
This week we’re coming to a function I have been waiting for a while, and now that we introduced the Gregorian year https://www.wikifunctions.org/view/en/Z20159 type last week, it could finally be implemented: is leap year https://www.wikifunctions.org/view/en/Z20181 (Z20181).
Is leap year takes a single argument, a Gregorian year https://www.wikifunctions.org/view/en/Z20159, and returns a simple Boolean https://www.wikifunctions.org/view/en/Z40: it returns true if the given year is a leap year, and false otherwise.
Leap years https://en.wikipedia.org/wiki/Leap_year were introduced many years ago, when folks noticed that their calendar years and the seasons and the skies were not aligning perfectly. In old Rome, a role was introduced, the *pontifex maximus* https://en.wikipedia.org/wiki/Pontifex_maximus, the chief bridge builder between our world and the world in the heavens, and, among other things, their job was to keep the human calendar counting aligned with the actual seasons and other heavenly events. Originally, the *pontifex maximus* simply decided, year by year, how long the year should be. Julius Caesar https://en.wikipedia.org/wiki/Julius_Caesar became *pontifex maximus* in 63 BC, but instead of deciding year by year, he reformed the calendar and set up predictable rules: every year would have 365 days, but every fourth year would be a leap year, and that’s 366 days long. This rule kept going for a few centuries.
Later the role of the *pontifex maximus* was picked up by the Catholic pope. The calendar was starting to again become out of sync with reality, and so pope Gregory XIII https://en.wikipedia.org/wiki/Pope_Gregory_XIII, as *pontifex maximus*, issued a bull https://en.wikipedia.org/wiki/Inter_gravissimas introducing the Gregorian calendar in 1582. The bull had two main effects: first, it dropped ten days off the calendar, to bring the calendar back in alignment with the seasons, and second, it modified the rules in order to further reduce the two from getting out of sync. Every fourth year would still be a leap year, but there was an exception: every hundredth year, the leap year would be skipped. But there’s also an exception to that exception: every 400 years we skip skipping the leap year. So, 1900 had and 2100 will have 365 days, but 2000 had 366.
Whereas most people are aware of the four-year rule of the Julian calendar, fewer people know the rules of the Gregorian calendar (given how rarely it occurs, that’s not exactly a surprise). And so it is unsurprising that there are many wrong implementations of this function. When searching for implementations of the leap year rule on GitHub, it is easy to find dozens of implementations that apply the leap year rule partially or incorrectly. One more example of why having a large library of functions is a good idea in general!
The function has a solid set of tests:
- this year, 2024 is a https://www.wikifunctions.org/view/en/Z20183 leap year - next year, 2025, is not https://www.wikifunctions.org/view/en/Z20254 - 2000 was a https://www.wikifunctions.org/view/en/Z20184 leap year, the last occurrence of the skipping the skipping the leap year rule - 1900 was not a https://www.wikifunctions.org/view/en/Z20248 leap year, the last occurrence of skipping the leap year rule - 1582 was not https://www.wikifunctions.org/view/en/Z20256 a leap year either - 1 BC was a https://www.wikifunctions.org/view/en/Z20252 leap year - 5 BC was a https://www.wikifunctions.org/view/en/Z20249 leap year, because it was four years before 1 BC - 2025 BC was a https://www.wikifunctions.org/view/en/Z20255 leap year, too - 1300 was a Julian leap year, but not one https://www.wikifunctions.org/view/en/Z20381 in the proleptic Gregorian calendar - 4000 AD will not be https://www.wikifunctions.org/view/en/Z20382 a leap year in the Gregorian calendar, but would be in Herschel's proposed modification
Note that the people living in 2025 BC obviously neither knew that they are living in 2025 BC nor that they are living in a leap year. That’s the meaning of proleptic: it is anachronistically applied back in time.
The function has currently the following implementations:
- one in Python https://www.wikifunctions.org/view/en/Z20182, representing the usual rules in a way: if the year number can be divided by 4, but not by 100 but then by 400, then it is a leap year. - one in JavaScript https://www.wikifunctions.org/view/en/Z20251, which, according to a detailed StackOverflow answer https://stackoverflow.com/questions/3220163/how-to-find-leap-year-programmatically-in-c/11595914#11595914, is the fastest possible check (but probably not in our implementation, given that we are using BigInt) - a composition https://www.wikifunctions.org/view/en/Z20275 which converts the year number to the ISO 8601 year https://www.wikifunctions.org/view/en/Z20257 (thus turning 1 BC to 0, 2 BC to -1, etc.), and then uses a series of ifs https://www.wikifunctions.org/view/en/Z802: if it is divisible by https://www.wikifunctions.org/view/en/Z20266 400, then true, else if it is divisible by https://www.wikifunctions.org/view/en/Z20266 100, then false, else whether it is divisible by https://www.wikifunctions.org/view/en/Z20266 4. - and a quite charming composition https://www.wikifunctions.org/view/en/Z20304, that checks that the day of the week of the last day of the year https://www.wikifunctions.org/view/en/Z20302 is the same as https://www.wikifunctions.org/view/en/Z17414 the day of the week following https://www.wikifunctions.org/view/en/Z17420 the day of the week of the first day of the year https://www.wikifunctions.org/view/en/Z20290.
The code implementations benefit from negative years being represented through an implicit ISO 8601 conversion, and so the usual rules can be directly applied.
I don’t find it obvious at all that the given implementations would always have the same result. But given the passing tests, I am quite confident that they are indeed interchangeable.
abstract-wikipedia@lists.wikimedia.org