The on-wiki version of this newsletter can be found here:
How would we generate a text in Abstract Wikipedia such as the first
sentence of the English Wikipedia article about Mariya Zerova
*“Mariya Yakovlevna Zerova, alternately Marija Jakovlevna Zerova, (April 7,
1902 - July 21, 1994) was a Ukrainian biologist and taxonomist known for
her work in mycology.”*
There are plenty of interesting questions regarding generating this short
sentence - the name, the biographical dates, the description. Today, let’s
just focus on the name.
Given that Zerova was Ukrainian, was born in and has lived in Ukraine, her
name was written using the Cyrillic alphabet, Марія Яківна Зерова. In her
English Wikipedia article, her name in the Cyrllic alphabet is given in the
Wikipedia infobox, but not in the text of the article. There are several
ways to transliterate the name from the Cyrillyc alphabet to the Latin
alphabet. Particularly, the letter я
<https://en.wikipedia.org/wiki/Ya_(Cyrillic)> can be transliterated as *Ya*
or *Ja* in English, which leads to the variation given in the English
Her Wikidata item <https://www.wikidata.org/wiki/Q12106673> states that her
first name is Marija <https://www.wikidata.org/wiki/Q18603722>, and not
Maria <https://www.wikidata.org/wiki/Q325872>, Mariya
<https://www.wikidata.org/wiki/Q39897333>, or Mariia
<https://www.wikidata.org/wiki/Q56433356> (all these three names are
written as Марія in Ukrainian). Names are a difficult mess, and so it is
not surprising that Wikidata is having trouble representing them. A big
thanks and shoutout to the hard work by the Wikiproject Names on Wikidata
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Names>, which aims to
sort out this kind of issue. You should join them if you are interested in
So, how would we get her name for Abstract Wikipedia for the different
languages? Do we need Lexemes for every first name in every language? Such
as the Lexeme Maria <https://www.wikidata.org/wiki/Lexeme:L414214> in
English? And then how would we link the given name in Wikidata to the given
name, and in turn the Lexemes link to that given name?
What about Yakovlevna, her patronym? Or Zerova, her family name? Both names
are rarer than Mariya. Would we expect Lexemes for each of these names in
Wikidata too, for each language individually? That seems like a lot of work.
In such cases I hope that the answer is no, and that we can figure out a
way to avoid that. But what could that look like? As usual, I expect that
as a community we will come up with a better solution than what I could
come up with. Together we are smarter than any one of us. So think of this
as a first, rough draft.
My first thought would be to have functions in Wikifunctions that take a
name such as *“Yakovlevna”* as a string and can generate all necessary
forms based on regular morphological functions
Names that have irregular forms would still be Lexemes, but if a function
can create the necessary forms, we should be able to use that directly
based on a string. So if we need the genitive form of Yakovlevna’s name (as
in this very sentence), a function would just generate it.
The same mechanism to generate the necessary forms may be helpful for many
place names and other proper names. In addition, we will likely need
functions that can transliterate between different alphabets, which is a
hornets' nest in itself. Transliterations can differ from target language
to target language: the transliteration of Зерова into German would be
Serowa, not Zerova, as it is in English.
Ukrainian Wikipedia logo
But that’s not all. The astute reader might have already noticed that
*Yakovlevna* is not a direct transliteration of Яківна. That would be
*Yakivna* (or *Jakivna*). What happened here?
In addition to the name being *transliterated* (i.e. where we map from one
script to another) the name was also *translated*, or backformed, in the
way it would be formed in Russian. The English form *Yakovlevna* is based
on the Russian form Яковлевна, and indeed, if we look in the Russian
Wikipedia, the Russian name for the biologist is Мария Яковлевна Зерова
a version of the name that is never mentioned on her native Ukrainian
By the way, if you are surprised to find that names can be translated,
enjoy seeing the names of Pope John Paul II
<https://www.wikidata.org/wiki/Q989> in different languages on Wikidata by
clicking on “All entered languages”.
How would Abstract Wikipedia ever figure out that it should first translate
Яківна to Russian and then transliterate it? Is this even the right thing
to do? To be honest, I am entirely stumped here. Should Ukrainian names in
general first be translated to Russian variants, and then be
transliterated? Let’s take two other Ukrainians, who both have the same
name: the President of Ukraine, and the brother of the Mayor of Kyiv, are
both named Володимир, but English Wikipedia refers to the President as
Volodymyr <https://en.wikipedia.org/wiki/Volodymyr_Zelenskyy> (a direct
transliteration) and to the other as Wladimir
<https://en.wikipedia.org/wiki/Wladimir_Klitschko>. In Ukrainian, they have
the same name!
I guess in many of those cases the best we can do is to rely on Wikidata,
and use the labels on the items as string input and the structured data
around given and family names. This allows us to enter and fix the data
manually, item by item, where there is evidence that an individual used a
different form. Only if Wikidata does not offer the necessary data, would
we need to use fallback functions. And the fallback functions could be
different from language to language, so that in Russian, Zerova can be
Яковлевна and in Ukrainian Яківна.
And maybe, just maybe, having to encode that explicitly will make us more
aware of how names of people and places flow through our knowledge
ecosystem, how they reflect power and inequity.
So many interesting things about just the first few words of this one
sentence, and we haven’t even talked yet about whether her birth date is
stated in the Gregorian, Julian, or another calendar!