The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-10-22
--
Common wisdom has it that skills with numbers and programming go
hand-in-hand. If someone is not good in mathematics, then they’ll be no
good in natural sciences, technology, or engineering. These skills go so
tightly together that people came up with a short acronym for their
conjunction: STEM
<https://en.wikipedia.org/wiki/Science,_technology,_engineering,_and_mathema…>.
Given the frequent use of formulas in science, technology, and engineering,
this seems to make sense: if you have a good instinct for numbers, units,
and relations between quantities, then you will more easily intuit
equations and scientific laws. Galileo said
<https://en.wikiquote.org/wiki/Galileo_Galilei#Il_Saggiatore_(1623)> that
all science is written in the language of mathematics, after all.
So, how would that not be true for programming a computer? They are called
computers, after all, because they compute numbers so well. The foundations
of computers are the two numbers 1 and 0 and the very fast and repeated
processing of operations on long strings of these two numbers.
Last year, a paper in Nature
<https://www.nature.com/articles/s41598-020-60661-8> actually tested this
wide-spread assumption. And, rather surprisingly, it discovered that there
is no correlation between STEM skills and the ability to learn to program.
Instead, it found a strong correlation between learning to program and
natural language aptitude.
I was very worried about the effort that we would need to undertake in
order to identify and recruit the right people for Wikifunctions: people
who can build a library of natural language generation functions for
hundreds of languages. Where would we find people skilled in both
under-represented languages and programming? Would there be enough of them?
Would they have the time to contribute to Wikifunctions or would they be
busy due to their rare combination of skills?
But as we can infer from the result in the Nature paper, this should turn
out to be easier than I initially feared. All we need to look for is
natural language aptitude, and through that we will cover all necessary
skills.
It shouldn’t have come as a surprise. Lady Ada Lovelace
<https://en.wikipedia.org/wiki/Ada_Lovelace>, widely known as the world’s
first programmer, proclaimed that we would use programming to work with
art, and that numbers were not the only domain that computers could work
with. She likened programming to poetry. As a counter-point, Donald Knuth
<https://en.wikipedia.org/wiki/Donald_Knuth>, author of The Art of Computer
Programming <https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming>,
estimated that only about 2% of the population are what he calls “geeks”,
with the mindset necessary for programming. He based this on his own
observations and his life-long attempts at educating and reaching out about
computer science.
But in many of Knuth’s writings, just as in many other introductions to
programming, you will start with examples in mathematics. The first example
in The Art of Computer Programming is Euclid’s algorithm
<https://en.wikipedia.org/wiki/Euclidean_algorithm> to determine the
greatest common divisor, and even before you get to the first section
heading, entitled “Mathematical preliminaries”, he has already talked about
prime numbers and averages, asked you to give a mathematical proof, and had
you formulate a set-theoretic definition. Many other books introducing
programming are no different, often assuming fluency in at least high
school mathematics and sometimes beyond.
Is it possible that by relying so much on a strong mathematical foundation
the field of computer science has systematically, if unintentionally,
excluded a large number of people who would otherwise be active
contributors to the world of programming? Can we imagine a more inclusive
approach to programming?
This is the community we should be aiming to grow and foster for
Wikifunctions: one where we do not exclude people because of their lack of
certain skills, such as mathematics. We want to give everyone the ability
to effectively use functions, to create functions, to share and talk about
functions. We should allow for people with different skill sets to
collaborate and reach more than any one of us can do. That is, and always
has been, the special advantage of the Wikimedia projects. Let us make a
concentrated effort to be open and welcoming.
And I think we can do so. To give one example: when Jeff Howard performed
user research
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-24> for
Wikifunctions, he identified that many people didn’t really get what we
were aiming for with Wikifunctions. He cited existing Wikimedia
contributors such as Vigneron
<https://meta.wikimedia.org/wiki/User:VIGNERON> who said that, while they
were excited about using Wikifunctions, they didn’t think they would
necessarily contribute to it. They didn’t think of themselves as
“programmers”.
Earlier this year, we were talking about morphological paradigms
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-10> to
create plurals in English. After we published that newsletter, one user saw
it and created a function <https://notwikilambda.toolforge.org/wiki/Z10148>,
tests <https://notwikilambda.toolforge.org/wiki/Z10150>, and an
implementation <https://notwikilambda.toolforge.org/wiki/Z10149> to do the
same thing in French. It was Vigneron!
It will be challenging. It will require new and inclusive ways of product
development to thoughtfully and intentionally ensure Wikifunctions is a
welcoming and inclusive community. But let us all commit to it. Let us be
mindful in the examples we choose, in the tutorials we write, in the
language we use.
I have not been mindful of this concern in many of my talks. My examples
were often drawn from mathematics, and the very first implementation I
presented was a recursive application of addition, using it to calculate a
product. I will aim to do better, and I plan to draw my examples from other
domains, in particular from natural language generation. And whereas I
fully expect us to quickly build up a library of functions in different
areas of STEM, which is of course important, let us be especially mindful
to not emphasize these to the exclusion of other areas, skill sets, and
interests.
The insight from the Nature paper is a gift to our project. Let us be
careful not to squander it.
(The weekly newsletter is always a collaborative effort by the whole team.
This week’s newsletter in particular benefitted from discussions,
contributions, editing, questions, and comments by James Forrester, Cory
Massaro, Aishwarya Vardhana, Adam Baso, and Nick Wilson. -- Denny)
--
The recording about Wikifunctions and Abstract Wikipedia with a Russian
translation <https://www.youtube.com/watch?v=x9NnGIXlvnI&t=20727s> at the
Russian Wiki-Conference
<https://ru.wikimedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8-%D0%BA%D0%BE%D0%BD%D…>
in Moscow, Russia, organized by Wikimedia RU
<https://ru.wikimedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0…>,
is now available on YouTube. Thanks to Gulnara for the translation!
The video recording from the Data Con LA 2021 Panel on Structured Data
<https://www.youtube.com/watch?v=W3KqygL7yqQ> with Wikifunction’s Denny
Vrandečić, Heather Hedden, and Karen Lopez, hosted by Joe Devon is now
online.
The Arab presentation slides about Abstract Wikipedia and Wikifunction at
WikiArabia
<https://commons.wikimedia.org/wiki/File:WikiArabia_2021_-_Wikifunctions_and…>
by Houcemeddine Turki <https://meta.wikimedia.org/wiki/User:Csisc> are now
online on Meta. The video recording is expected to be online later.
Houcemeddine will also present an English version of that talk at
WikidataCon <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021> next
week.
Talking about WikidataCon
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021>! Next weekend we
celebrate the ninth anniversary of Wikidata! From the 29th to the 31st of
October we have three days full of program, community, and data. This
year’s WikidataCon is accessible online and will be co-hosted by Wikimedia
Deutschland <https://www.wikimedia.de/> and Wiki Movimento Brasil
<https://meta.wikimedia.org/wiki/Wiki_Movement_Brazil_User_Group>. You
can register
for WikidataCon 2021 <https://pretix.eu/WDCon21/WDCon21/> for free!
At WikidataCon, on Friday
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Day_1_-_Mai…>,
Tochi Precious of the Igbo community is joined by Denny Vrandečić in “Igbo
and Abstract Wikipedia - a conversation” hosted by Silvia Gutiérrez.
Also, we are looking forward to the fiftieth newsletter next week. Expect
something long in the making.
It is not my book, but I think it will interest people around here. I think it is an easy ready, it meant to be read by professionals and hobbyist alike. There is no code.
A human-inspired, linguistically sophisticated model of language understanding for intelligent agent systems.
The open access edition of this book was made possible by generous funding from Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin.
One of the original goals of artificial intelligence research was to endow intelligent agents with human-level natural language capabilities. Recent AI research, however, has focused on applying statistical and machine learning approaches to big data rather than attempting to model what people do and how they do it. In this book, Marjorie McShane and Sergei Nirenburg return to the original goal of recreating human-level intelligence in a machine. They present a human-inspired, linguistically sophisticated model of language understanding for intelligent agent systems that emphasizes meaning—the deep, context-sensitive meaning that a person derives from spoken or written language.
With Linguistics for the Age of AI, McShane and Nirenburg offer a roadmap for creating language-endowed intelligent agents (LEIAs) that can understand,explain, and learn. They describe the language-understanding capabilities of LEIAs from the perspectives of cognitive modeling and system building, emphasizing “actionability”—which involves achieving interpretations that are sufficiently deep, precise, and confident to support reasoning about action. After detailing their microtheories for topics such as semantic analysis, basic coreference, and situational reasoning, McShane and Nirenburg turn to agent applications developed using those microtheories and evaluations of a LEIA's language understanding capabilities.
McShane and Nirenburg argue that the only way to achieve human-level language understanding by machines is to place linguistics front and center, using statistics and big data as contributing resources. They lay out a long-term research program that addresses linguistics and real-world reasoning together, within a comprehensive cognitive architecture.
https://direct.mit.edu/books/book/5042/Linguistics-for-the-Age-of-AI
Hi all,
Sorry, we are a bit swamped with work. The originally planned update did
not work out and had to be postponed, and we didn't have the time to write
another one instead. So we decided to skip it for this week. See you again
next week!
Cheers,
Denny
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-10-08
------------------------------
This week we are happy to welcome Cai Blanton
<https://meta.wikimedia.org/wiki/User:CBlanton_(WMF)> to the Wikimedia
Foundation and to the Abstract Wikipedia team! I will let Cai introduce
herself with her own words.
“I am thrilled to be joining WMF as the Senior Engineering Manager for
Abstract Wikipedia. From my beginnings as a full-stack UX-focused software
engineer, I have focused my career on building products that make people’s
lives better, spanning from education to employment technology. At the
heart of it all lies my passion for DIBE (Diversity, Inclusion, Belonging,
and Equity) and drive to create an environment where collaboration is
personal and fun.
“Languages and the nuances of cross-cultural communication have fascinated
me since grade school when I took my first Spanish class. This interest has
only grown through my further language studies, stint as a linguistic
major, and time living and working abroad in a multinational environment in
Western Europe and Scandinavia. The Abstract Wikipedia vision is
particularly compelling to me for these reasons.
“I look forward to working together with the community to advance global
knowledge equity!”
We are also happy to welcome Adesoji Temitope to our team. Adesoji joins
Simone
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-03-10> (who
joined us together with Lindsay
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-30>)
from ThisDot. Adesoji <https://twitter.com/temitopedavid_> is on Twitter.
Here is his introduction in his own words.
“I am Adesoji Temitope, a software developer at Thisdot. I am currently
based in Lagos, Nigeria.
“I love to play football and first-person shooter games.
“Started learning to code in my last year in high school and worked on a
lot of personal projects using PHP. I luckily got into a school that had a
lab setup by one of the lecturers and I was able to really start learning
to work with people and had my first live code. I owe a lot of my early
assistance to my brother.
“Really love researching about communities and I am excited about the
opportunity to join the Wikimedia team.”
Please join us in welcoming Cai and Adesoji to the team!
------------------------------
On October 15, Houcemeddine Turki
<https://meta.wikimedia.org/wiki/User:Csisc> will present Wikifunctions
<https://www.wikiarabia2021.com/en_GB/event/propose-2/agenda> at next
week’s WikiArabia conference <https://www.wikiarabia2021.com/en_GB/> organized
by the Wikimedia Algeria User Group
<https://meta.wikimedia.org/wiki/Wikimedia_Algeria>. The presentation will
be in Arabic.
We also presented Wikifunctions and Abstract Wikipedia at the Russian
Wiki-Conference
<https://ru.wikimedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8-%D0%BA%D0%BE%D0%BD%D…>
in
Moscow, Russia, organized by Wikimedia RU
<https://ru.wikimedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0…>.
The presentation was translated live into Russian. Thanks to Gulnara for
the translation! We will link to the recordings when they are available.
We also presented Wikifunctions and Abstract Wikipedia at the German WikiCon
<https://de.wikipedia.org/wiki/Wikipedia:WikiCon_2021> in Erfurt, Germany,
organized by the German-speaking Wikimedia community with support from
Wikimedia
Deutschland <https://meta.wikimedia.org/wiki/Wikimedia_Deutschland>. The
presentation was given in German. We will link to the recordings when they
are available.
Thanks to the communities and the organizations for organizing these hybrid
events. It is beautiful to see the communities come together again, but
also the effort to continue to allow people to participate online. It is a
lot of work, and thank you all for your efforts.
Dear all,
I thank you for your contributions to the Wikifunctions Project. As an end user of the Wikifunctions Project, I have been invited to speak at WikiArabia about Wikifunctions and Abstract Wikipedia in Arabic. That is why I developed and implemented several linguistic functions for Arabic Languages:
* Root and Pattern-Based Generator of Lexemes for Arabic Languages (Z10157)
* Pattern-Root Compatibility Verifier for Arabic Languages (Z10160)
* IPA Generator for Diacritized Arabic Script Texts in Tunisian Arabic (Z10163)
This implies the creation of Python codes for the three functions, the development of test functions and the description of the developed functions. When developing the functions, I have found several matters that can be solved in the next few months:
1. When a word assigns two Arabic Diacritics to a letter, this can cause a deficiency to the system. For example, كَرَّر has two Arabic diacritics (a shaddah and a fatha) on its second letter. The shaddah should be below the Fatha as its effect should come first. The Wikifunctions compilers do not efficiently consider that and this can harm the processing of the languages using the Arabic Script. This should be fixed.
2. The identation of the source code should be done by hand after pasting the code into the field. There is no automatic identation for pasted source codes. This can alter the user experience.
3. The mobile edition of the website does not work. Lucas Werkmeister has raised a ticket about this (T291325).
4. All these linguistic functions are taken from reference grammar books. It will be interesting to have a function that assigns a Wikidata item as a reference of a Wikifunctions function.
5. The runtime of the website is signficantly important. Several efforts should be done to make this project quicker.
6. It will be interesting to align inputs with their corresponding Wikidata items to have better semantics for the functions.
7. System messages are not absolutely user-friendly. This can be fixed.
8. The token for the connection to NotWikiLambda does not allow a long connection. It almost disconnects every fifteen minutes.
Yours Sincerely,
Houcemeddine Turki
Wikidata,
Abstract Wikipedia,
Hello. I am recently thinking about objectivity and subjectivity with respect to natural language generation, in particular in the contexts of story generation using historical data [1][2].
In the near future, digital humanities scholars – in particular historians – could modify collections of data and finetune generation-related parameters, watching as resultant multimodal historical narratives emerged and varied. In these regards, we can envision both computer-aided and automated historical narrative generation tools and technologies.
Could AI be a long-sought objective narrator for historians? Is all narration, or all language use, inherently subjective? What might the nature of “generation-related parameters” and “finetuning” be for style and subjectivity [3][4][5][6][7][8] when generating natural language and multimodal historical narratives from historical data [1][2]?
Thank you. Hopefully, these topics are interesting.
Best regards,
Adam Sobieski
[1] Metilli, Daniele, Valentina Bartalesi, and Carlo Meghini. "A Wikidata-based tool for building and visualising narratives." International Journal on Digital Libraries 20, no. 4 (2019): 417-432.
[2] Metilli, Daniele, Valentina Bartalesi, Carlo Meghini, and Nicola Aloia. "Populating narratives using Wikidata events: An initial experiment." In Italian Research Conference on Digital Libraries, pp. 159-166. Springer, Cham, 2019.
[3] https://en.wikipedia.org/wiki/Subjectivity
[4] https://en.wikipedia.org/wiki/Objectivity_(philosophy)
[5] https://en.wikipedia.org/wiki/Political_subjectivity
[6] https://en.wikipedia.org/wiki/Framing_(social_sciences)
[7] https://en.wikipedia.org/wiki/Focalisation
[8] https://en.wikipedia.org/wiki/Point_of_view_(philosophy)
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-30
--
This week is the last week Lindsay Wardell will be working on Wikifunctions
as a contractor from ThisDot <https://www.thisdot.co/>. Her contributions
to the code, particularly to the front end, can be seen everywhere. The
discussions with her and insights we learned from her about Wikifunctions
and how composition should work, about errors, about functional
programming, and about the details of the functional model will have a
lasting impact on the project. She very quickly developed a deep
understanding and intuition of what the project could achieve, and was able
to channel that understanding into creative solutions. It was a pleasure
working with her. The whole team is sad to see Lindsay go.
We are very thankful to Lindsay for her contribution, and we congratulate
her on her new role. Here are her own words.
“When I started working on this project back in March, I was fascinated
with the goal and the ambition behind it. Providing a way for Wikipedia
articles to be provided in any number of languages is exciting in its own
right, but also providing a platform for people to interact with data and
create their own functions spoke to me personally. I have enjoyed working
so much on the Wikifunctions platform, and building the experience for
users to create and utilize their own functions.
I have loved working with this team (and the Wikimedia Foundation in
general). From day one, I was accepted as a member of the group, despite my
official role as a consultant. The feeling of being welcome was so
wonderful to feel. I have so much respect for each and every member of the
Foundation that I got to work with, and I am very grateful that I got to
interact with them on such an exciting project.
It was always a dream to get to work with the Wikimedia Foundation, and my
experience was truly amazing. Once the dust settles around me, I fully
intend on being a part of the community that is forming around Abstract
Wikipedia and Wikifunctions. I look forward to participating as a community
member and contributor to the project.”
You can follow Lindsay on Twitter <https://twitter.com/lindsaykwardell> or
listen to the Views on Vue <https://viewsonvue.com/> podcast she is a host
on. Again, congratulations to your new role, we know how excited you are
about it, and we all wish you the best!
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-24
--
<https://meta.wikimedia.org/wiki/File:Research_-_Wikifunctions_mental_model.…>
<https://meta.wikimedia.org/wiki/File:Research_-_Wikifunctions_mental_model.…>
Wikifunctions mental model
In order to understand the potential contributors and users for
Wikifunctions better, we had Jeff Howard
<https://commons.wikimedia.org/wiki/User:JDH264> conduct user research to
better understand the potential contributor base for Wikifunctions. The
results of this work are now available in the form of two reports on
Wikimedia Commons:
*Wikifunctions mental models
<https://commons.wikimedia.org/wiki/File:Research_-_Wikifunctions_mental_mod…>.*
18
participants were interviewed in order to understand the current mental
models about Wikifunctions, and to uncover potential problems. Below are a
few of the problems found during this work. Please find more details in the
full report.
- The goals of the project were confusing: what does Wikifunctions aim
to achieve?
- The mock-ups were confusing, and they didn’t explain how the project
would work, or what one would do on the site.
- It was unclear whether non-programmers could contribute to or benefit
from the project.
The latter exposes a particular challenge for the project. I’ll come to it
later, but one main goal of the project is to be accessible for people who
do not currently see themselves as programmers. In fact, we think that
people who are currently non-programmers may benefit from Wikifunctions
most!
<https://meta.wikimedia.org/wiki/File:Publish_-_Wikifunctions_feedback.pdf>
<https://meta.wikimedia.org/wiki/File:Publish_-_Wikifunctions_feedback.pdf>
Wikifunctions developer feedback
*Wikifunctions feedback
<https://commons.wikimedia.org/wiki/File:Publish_-_Wikifunctions_feedback.pdf>.*
10
developers were interviewed in order to see what developers think of the
Wikifunctions idea. There are many interesting ideas and discussions in
this report:
- Discussions of GitHub vs MediaWiki versioning.
- How well will the UI support more complex implementations?
- How to curate many different implementations for a function?
The developers correctly identified that Wikifunctions is not about whole
programs, but about individual functions that can then be used like a
toolbox for many purposes. The discussions and questions also laid bare the
expectations developers might have for Wikifunctions, and where we need to
make our communication clearer in order to not disappoint potential
contributors.
Both reports indicate some of the challenges Wikifunctions will face. We
are taking the reports seriously and are using them as input for our UX
design. Even given these results, our goal remains to make writing
functions and implementations in Wikifunctions accessible to novice coders,
and Wikifunctions usable and understandable by people who are not already
coders.
<https://meta.wikimedia.org/wiki/File:Wikilambda_-_Early_Eta_-_Create_a_new_…>
<https://meta.wikimedia.org/wiki/File:Wikilambda_-_Early_Eta_-_Create_a_new_…>
Creating a new function in the Early Eta Wikilambda prototype
We are currently designing the function editor to be more approachable,
intuitive, and mobile-friendly. The video here gives you a first view of
how to define and edit a function
<https://commons.wikimedia.org/wiki/File:Wikilambda_-_Early_Eta_-_Create_a_n…>.
It involves a number of simple steps, while providing guidance throughout
the process. We can see an automated diagram of the function on the right.
Adding testers and implementations can also be done directly from within
the function editor.
The implementation of this interface should land in the prototype soon, so
you will be able to test it 'live'. We hope that this makes function
creation and editing in Wikifunctions considerably easier, more
understandable, and more enjoyable than the initial, placeholder experience.
Enjoy reading the reports and watching the video!
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-17
--
Last week we discussed how to implement paradigms
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-10> in
Wikifunctions. This week, let’s discuss a few ideas on how this could be
used.
One may ask why this is useful, given that we are collecting all the
different forms in the lexicographic data in Wikidata anyway. We don’t need
to generate the forms if we have a full set of forms in Wikidata, surely?
There are several possible use cases:
First, we will probably never achieve a full coverage in Wikidata of all
forms in all languages. In some languages, the number of forms may be
prohibitively high, and we, like every other dictionary, might need to make
a selection of forms to store. Often the forms not stored are highly
regular.
Second, even if we have really good coverage, occasionally you will need to
introduce words that are not in the dictionary: when displaying neologisms,
when generating a new lexeme by conversion from another grammatical
category (for example: verbing nouns in English, or using place names to
make demonyms), or when using loanwords from other languages. Fortunately,
such words are often regular, and having smart paradigms as described last
time can take us pretty far.
Third, the paradigms can be used in Wikidata to connect to the actual
lexemes. For example, on a lexeme such as *"cat
<https://www.wikidata.org/wiki/Lexeme:L7>"* we could link to the paradigm
that we developed last week, either the add s
<https://notwikilambda.toolforge.org/wiki/Z10110> function or the English
regular plural <https://notwikilambda.toolforge.org/wiki/Z10132> function.
Linking the lexeme with the function allows individual forms to be
re-generated, which in turn means they can be checked for correctness, thus
ensuring data quality. The English regular plural function can tell us that
the plural for *"pasty <https://www.wikidata.org/wiki/Lexeme:L24858>"* should
be *"pasties"*, but that Wikidata lexeme previously defined it as *"pastiest
<https://www.wikidata.org/w/index.php?title=Lexeme:L24858&oldid=1033449948#F2>"*.
The plural of *"strawman"* should be *"strawmen"*, not *"strawmans
<https://www.wikidata.org/w/index.php?title=Lexeme:L227827&oldid=1069374578>"*;
the plural for *"Frenchwoman"* should be *"Frenchwomen"* not *"Frenchwoman
<https://www.wikidata.org/w/index.php?title=Lexeme:L34524&oldid=1392427375>"*
.
One question is: if we have a paradigm that can create the forms, why even
create and store the forms in Wikidata in the first place? That’s a great
question, and a decision that can indeed be revisited by the community.
Personally, I think we need both forms stored explicitly in Wikidata and
generative paradigms. Without the former, it's not clear how we would
handle irregular forms — would the onus lie on the paradigms? That seems
messy. Likewise, paradigms are crucial when, for example, a Lexeme has
thousands of possible forms. If these forms are always regular, the
community might decide not to materialize them all — especially if many
Lexemes cleave to the same regular morphological pattern.
This seems also to be the case for English nouns: almost all of the English
nouns in Wikidata have two forms, even though one could argue that English
nouns have four forms (including the possessive forms); however, the English
possessive <https://en.wikipedia.org/wiki/English_possessive> forms seem to
be generated so regularly that, so far, Wikidata contributors seem to
consider them unnecessary and usually omit them.
Fourth, the paradigms can also be used to propose a starting point when
entering the data. Imagine the Wikidata Lexeme Forms
<https://www.wikidata.org/wiki/Wikidata:Wikidata_Lexeme_Forms> allowing you
to select a function on Wikifunctions that, given the lemma, generates all
likely forms for an entry. The Lexeme Forms tool has already improved the
creation of Lexemes considerably, making the entries much more consistent
and expansive. If, in addition, we could also automatically generate most
of the forms, this would increase the speed of entering the data by a lot -
and at the same time reduce the likelihood of data entry errors.
Besides all these immediate improvements, there might be many further
advantages. For example, storing an offline dictionary would require much
less storage space if we use paradigms. Developing paradigms for currently
under-resourced languages might create aids for working with those
languages. Having a knowledge base of paradigms across languages may be
interesting from the perspective of linguistic research.
Once Wikifunctions has launched, we hope that the community will develop a
library of morphological paradigms and their connection with the
lexicographical data in Wikidata. Besides this being a very helpful step on
our path to Abstract Wikipedia, we think that this will considerably expand
the content of the lexicographical data in Wikidata. That — together
with enabling
access to the lexicographic data from within the Wiktionaries
<https://phabricator.wikimedia.org/T235901> — will help with significantly
empowering the contributors to Wiktionary, particularly to the smaller
Wiktionaries and to the languages with fewer contributors in all
Wiktionaries.
Thanks to User:YULdigitalpreservation
<https://www.wikidata.org/wiki/User:YULdigitalpreservation>, who
created EntitySchema
E327 <https://www.wikidata.org/wiki/EntitySchema:E327> on Wikidata for
English Nouns with Genitives, and to User:VIGNERON
<https://meta.wikimedia.org/wiki/User:VIGNERON> for creating French plural
morphology on NotWikiLambda, and User:Strobilomyces
<https://en.wikipedia.org/wiki/User:Strobilomyces> for collaborating on
that.
The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-08-27
--
When we started the development effort towards the Wikifunctions site, we
sub-divided the work leading up to the launch of Wikifunctions into eleven
phases <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases>, named
after the first eleven letters of the Greek alphabet.
- With Phase α (alpha) completed, it became possible to create instances
of the system-provided Types in the wiki.
- With Phase β (beta), it became possible to create new Types on-wiki
and to create instances of these Types.
- With Phase γ (gamma), all the main Types of the pre-generic function
model were available.
- With Phase δ (delta), it became possible to evaluate built-in
implementations.
- With Phase ε (epsilon), it became possible to evaluate
contributor-written implementations in any of our supported programming
languages.
- This week, we completed Phase ζ (zeta).
The goal of Phase ζ has been to provide the capability to evaluate
implementations composed of other functions.
What does this mean? Every Function in Wikifunctions can have several
Implementations. There are three different ways to express an
Implementation:
1. As a built-in Function, written in the code of Wikilambda: this means
that the Implementation is handled by the evaluator natively using code
written by the team.
2. As code in a programming language, created by the contributors of
Wikifunctions: the Implementation of a Function can be given in any
programming language that Wikifunctions supports. Eventually we aim to
support a large number of programming languages; for now we support
JavaScript and Python.
3. As a composition of other Functions: this means that contributors can
use existing Functions as building blocks in order to implement new
capabilities.
With Phase ζ we close the trilogy of Phases dealing with the different ways
to create Implementations.
Besides making composition work, we also spent some time on other areas.
We worked to reduce technical debt that we accumulated in development
during the last two phases which we rushed in order to be ready for the
security and performance reviews. We improved how the error system works,
re-worked the data model for Testers and Errors, refactored the common
library to be more extensible, moved the content of the wiki to the main
namespace, and changed Python function definitions to align with the style
we use for JavaScript ones.
We started with some work to make the current bare-bones user experience
better. This included displaying Testers' results and meta-data on their
own page as well as related Function and Implementation pages. Functions
and Implementations can be easily called right from their page. We made it
much easier to create and connect Implementations and Testers with their
functions, started on the designs for Function definition and
implementation, and implemented aliases that sit alongside labels, much
like in Wikidata. Plenty done!
We are now moving on to Phase η (eta). The three main goals of phase η is
to finish the re-work of the Error system, to revisit user-defined types
and integrate them better with validators, and to allow for generic types.
What are generic types?
We have a type for a list of elements. But instead of saying “this is a
list of elements”, we can often be more specific, and for example say “this
is a list of strings”. Why is that useful? Because now, if, for example, we
have a function to get the first element of a list, we know that this
function will return a string when given this kind of list. This allows us
to then offer a better user experience by making more specific suggestions,
because now the system knows that it can suggest functions that work with
strings. We can also check whether an implementation makes sense by
ensuring that the types fit. We won’t be able to do that in all cases, but
having generics will allow us to increase the number of cases where we can
do that by a lot. For more background you can refer to the Wikipedia
article on generic programming
<https://en.wikipedia.org/wiki/Generic_programming>.
In this example case, instead of a special type representing a list of
strings, we will have a function that takes a type and returns a typed
list. If you then call this function with the string type as the argument,
the result of the function will be the concept of a list of strings. And
you can easily use that for any other type, including user-defined types.
My thanks to the team! My thanks to the volunteers! Some of us are starting
to have fun using the prototype, playing with implementations across
different programming languages interacting with each other in non-trivial
ways, and starting to build a small basic library of functions. This will
also be the phase where we move from the pre-generic data model
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Pre-generic_function_mod…>
to
the full function model
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model>. To
give due warning: this probably means that almost everything will need to
be re-written by the end of this phase, in order to take advantage of the
generic system that we are introducing.
Thank you for accompanying us on our journey!