The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-11-09
--
Checking lexical forms
Previously we have discussed morphological paradigms
<https://meta.wikimedia.org/wiki/Special:MyLanguage/Abstract_Wikipedia/Updates/2021-09-10>,
and how lexemes and paradigms
<https://meta.wikimedia.org/wiki/Special:MyLanguage/Abstract_Wikipedia/Updates/2021-09-17>
could
be used. To summarize and simplify, paradigms are patterns of inflection
<https://en.wikipedia.org/wiki/Inflection> of a word (or lexeme), and
functions can implement paradigms and specific inflections. To give an
example, the usual way to get the plural of a noun in English is to add the
letter s to its basic form, the so-called 'lemma'.
On Notwikilambda, the community-run preview version of Wikifunctions, we
started implementing a few such functions. Correspondingly, we recreated
some of them in the Wikifunctions Beta: *e.g.* add s to end
<https://wikifunctions.beta.wmflabs.org/wiki/Z10210> and replace y at end
with ies <https://wikifunctions.beta.wmflabs.org/wiki/Z10238>.
In order to demonstrate their use, we developed a small, browser-based
tool, form check <https://vrandezo.github.io/formcheck/>. Form check allows
you to select a language and a part of speech (*e.g.* English nouns), and
then state which forms you want to generate (*e.g.* the plural). Then you
choose the function from the Wikifunctions Beta, and the tool checks
whether the form as recorded in Wikidata corresponds to the output of the
function.
If it doesn’t, this may indicate an error, either in the function or in the
data, or an irregular form.
Form check has at least one major shortcoming, which is that it currently
does not allow you to filter for further statements on the lexeme. In many
languages this is crucial: for example, in German, nouns are inflected
differently depending on their grammatical gender. It also doesn’t
automatically update the list of available functions (but you can enter an
arbitrary ZID). The code is open source
<https://github.com/vrandezo/formcheck>, and contributions (or, indeed,
someone wanting to take over the code) would be more than welcome.
It is said that it is better to show than tell. In this spirit, we created
a 13 minute video. It demonstrates how the form check tool is used, how it
was helpful to find an error in a lexeme on Wikidata, and how it was used
to discover a paradigm and implement the respective function.
The video can be watched here:
https://meta.wikimedia.org/wiki/File:Checking_forms.webm
We invite you to implement more morphological functions in Wikifunctions
Beta, and try them out with the form check tool. Please report errors that
you find on the way, so we can fix them. And also share your results, and
how well you can cover all the different linguistic variations in your
language with your functions!
------------------------------
There are a number of interesting aspects to this demonstration.
Firstly, it shows the possible use of Wikifunctions as currently
implemented for natural language-related functions. It ties in directly
with the data on Wikidata, and offers both a way to find errors in the
data, and also an exploration that might help with finding patterns in the
data and so to create more such functions. Although I don’t speak
Ukrainian, I was able to create a function that captured the morphology of
a specific Ukrainian form. These functions can then, in turn, help us
discover more inconsistencies, or even to enter data faster and in a way
less prone to errors. For example, I would really love it if there was a
way to attach functions to the fields in the Wikidata Lexeme Forms
<https://www.wikidata.org/wiki/Wikidata:Wikidata_Lexeme_Forms>, so that I
would only enter the lemma, and it would automatically fill in the other
fields based on the Wikifunction's results, and then, if needed, I could
manually edit the results to be correct before publishing.
Secondly, it shows how relatively easy it is to write functions, testers,
and implementations. In this case, it took us less than four minutes to
define the functions, write a tester, and provide an implementation. Our UX
is currently being improved to make many of these steps easier and more
intuitive. Not all functions will be as easy to implement. But in this
case, no coding was required at all, since we had a relevant function that
we could use for composition, replace at end
<https://wikifunctions.beta.wmflabs.org/wiki/Z10220>. Our hope is that a
solid library of such versatile functions can take us a long way towards
pretty good coverage of morphological functions. But even if the
implementation should turn out to be more complex, defining the function
and providing test cases is something we expect might be possible for many
potential contributors.
And thirdly, it shows, probably for the first time, an external tool
calling a function from Wikifunctions (albeit Beta). It is just a website,
standing in front of Wikifunctions, asking it to evaluate a function. Form
check calls the SPARQL endpoint of Wikidata, and uses data from there to
then to ask Wikifunctions to evaluate a function. The whole thing is a
static website, needs no libraries at all, merely plain old JavaScript, and
could be hosted anywhere (in fact, you can also download the HTML and load
the page locally; it should work just as well).
Note that I am rather unsure whether the Form check tool is a good and
useful tool. Do we really need to check thousands of forms by each
individual user? We would probably want a shared resource for doing this
evaluation instead. The tool is meant as an early inspiration that will
hopefully lead to other tools, libraries and workflows, which are more
robust, reusable, and are closely aligned with how the community works.
Volunteer’s corner
Thanks to everyone who joined the volunteer’s corner on Monday. It was
lively. Thanks to all who attended! The next will be on Monday, December 5
at 18:30 UTC <https://zonestamp.toolforge.org/1670265038>.
WikiConference North America 2022
This weekend Wikifunctions will be presented at the WikiConference North
America <https://meta.wikimedia.org/wiki/WikiConference_North_America/2022>,
jointly held with OpenStreetMaps USA. The presentation
<https://wikiconference.org/wiki/Submissions:2022/Wikifunctions_-_a_new_Wikimedia_project>
will
be on Saturday, November 12 at 20:15 UTC
<https://zonestamp.toolforge.org/1668284148>, and we will focus on
Wikifunctions and possible use cases in the world of maps.
Staff editing policy
We are, for now, closing the hot phase of the staff editing policy
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Staff_editing>. The
policy belongs to the community, and can always be evolved and adapted by
you. We will, on launch, copy it over to Wikifunctions, and will follow
this policy.
Development updates
Experience & Performance:
- Fixed more FE bugs
- Merged patches related to error management
- Made great progress on drafting the Default Component technical specs
Meta-data:
- Completed readable summaries of all error types (T312611
<https://phabricator.wikimedia.org/T312611>) and ability to record which
implementation gets selected (T320457
<https://phabricator.wikimedia.org/T320457>)
Natural Language Generation:
- Finalized template language document
- More analysis on dependencies for isiZulu