Your original posting mentions that *Constructors* would essentially
hold the conditional logic, or "rules"?
But in your followup, I see you mention *Renderers*?
I'm curious where the delineation of rules will occur, and if the answer
is "it depends"?
Have you given much thought to constraints on Constructors or Renderers
themselves (Are there high level design docs available for each of those
Or do you think that will be something still being worked through in the
long term with community use cases, and practices that evolve?
On Fri, May 7, 2021 at 10:07 AM Denny Vrandečić <
If the abstract content states (and I am further simplifying):
type: animal type phrase
- type of animal: cat
- sex: male
that might be represented e.g.
or it could be, if we are using QIDs for the values,
so it wouldn't be based on English, it would be abstracted from the
Now there could be a Renderer in Dutch for 'animal type phrases' that
if Z14000K1 = Q146/cat:
if Z1400K2 = unknown or Z1400K2 = Q43445/female organism:
return L208775/kat (Dutch, noun)
if Z1400K2 = Q44148/male organism:
return L.../kater (Dutch, noun)
This is just for selecting the right Lexeme. Further functions would
now select the right form, depending on how the sentence looks like.
But nowhere do we need to refer to the Senses or to explicitly modeled
On the other hand, we *could* refer to the Senses and items. (And this
is what I meant with not being prescriptive - I am just sketching out one
possibility that does *not* refer to them). Because we could also write a
multilingual Renderer (e.g. as a fallback Renderer?) that does for example
Animal = Z1400K1 // which would be Q146/cat in our example
Senses = FollowBacklink(P5137/item for this sense)
Lexemes = GetLexemesFromSenses(Senses)
DutchLexemes = FilterByLanguage(Lexemes, Q7411/Dutch)
return ChooseOne(DutchLexemes) // that would need to be some
This probably would need some refinement to figure out how the sex
would play into this, but it's a just the start of a sketch. You could also
imagine to build something on Defined Meanings at this point.
I hope that makes sense - happy to answer more. And again, it is all
Also, Happy Birthday, Gerard!
On Thu, May 6, 2021 at 10:23 PM Gerard Meijssen <
> I fail to understand. You have the data in the prescribed manner for
> an article. The original is based on English. How can you generate from the
> data a text in Dutch or any other language, when you do have the Senses but
> not the meanings of the words.
> On Thu, 6 May 2021 at 23:38, Denny Vrandečić <dvrandecic(a)wikimedia.org>
>> The on-wiki version of this newsletter can be found here:
>> In 2018, Wikidata launched a project to collect lexicographical
>> Several hundred thousand Lexemes have been created since then, and this
>> year the tools will be further developed by Wikimedia Deutschland to make
>> the creation and maintenance of the lexicographic knowledge in Wikidata
>> The lexicographic extension to Wikidata was developed with the goal
>> that became Abstract Wikipedia in mind, but a recent discussion within the
>> community showed me that I have not made the possible connection between
>> these two parts clear yet. Today, I would like to sketch out a few ideas on
>> how Abstract Wikipedia and the lexicographic data in Wikidata could work
>> There are two principal ways to organize a dictionary: either you
>> organize the entries by ‘lexemes’ or ‘words’ and describe their senses
>> (this is called the semasiological
>> <https://en.wikipedia.org/wiki/Semasiology> approach), or you
>> organize the entries by their ‘senses’ or ‘meanings’ (this is called the
>> onomasiological <https://en.wikipedia.org/wiki/Onomasiology>
>> approach). Wikidata has intentionally chosen the semasiological approach:
>> the entries in Wikidata are called Lexemes, and contributors can add Senses
>> and Forms to the Lexemes. Senses stand for the different meanings that a
>> Lexeme may regularly invoke, and the Forms are the different ways the
>> Lexeme may be expressed in a natural language text, e.g. in order to be in
>> agreement with the right grammatical number, case, tense, etc. The Lexeme
>> “mouse” (L1119 <https://www.wikidata.org/wiki/Lexeme:L1119>) thus
>> has two senses, one for the small rodent, one for the computer input
>> device, and two forms, “mouse” and “mice”. For an example of a
>> multilingual onomasiological collaborative dictionary, one can take a look
>> at the OmegaWiki <http://www.omegawiki.org/> project, which is
>> primarily organized around (currently 51,000+) Defined Meanings
>> <http://www.omegawiki.org/Help:DefinedMeaning> and how these are
>> expressed in different languages.
>> The reason why Wikidata chose the semasiological approach is based on
>> the observation that it is much simpler for a crowd-sourced collaborative
>> project, and has much less potential to be contentious. It is much easier
>> to gather a list of words used in a corpus than to gather a list of all the
>> meanings referred to in the same corpus. And whereas it is 'simpler', it
>> still not trivial. We still want to collect a list of Senses for each
>> Lexeme, and we want to describe the connections between these Senses:
>> whether two Lexemes in a language have the same Sense, how the Senses
>> relate to the large catalog of items in Wikidata, and how Senses of
>> different languages relate to each other. These are all very difficult
>> questions that the Wikidata community is still grappling with (see also the
>> essay on Making Sense
>> Let’s look at an example.
>> “Stubbs was probably one of the youngest mayors in the history of the
>> world. He became mayor of Talkeetna, Alaska, at the age of three months and
>> six days, and retained that position until his death almost four years ago.
>> Also, Stubbs <https://en.wikipedia.org/wiki/Stubbs_(cat)> was a cat."
>> If we want to express that last sentence - “Stubbs was a cat” - we
>> will have to be able to express the meaning “cat” (here, we will
>> focus entirely on the lexical level, and will not discuss grammatical and
>> idiomatic issues; we will leave those for another day). How do we refer to
>> the idea for cat in the abstract content? How do we end up, in English,
>> eventually with the word form “cat” (L7-F4
>> <https://www.wikidata.org/wiki/Lexeme:L7#F4>)? In French with the
>> word form “chat” (L511-F4
>> <https://www.wikidata.org/wiki/Lexeme:L511#F4>)? And in German with
>> the form “Kater” (L303326-F1
>> Note that these three words commonly do not have the same meaning.
>> The English word cat refers to both male or female cats equally; and
>> whereas the French word could refer to a cat generically, for example if we
>> wouldn’t know Stubbs’ gender, the word is male, but a female cat would
>> usually be referred to using the word “chatte”. The German word, on
>> the other hand, may only refer to a male cat. If we wouldn’t know whether
>> Stubbs is male or female, we would need to use the word “Katze” in
>> German instead, whereas in French, as said, we still could use “chat”.
>> And English also has words for male cats, e.g. “tom” or “tomcat”,
>> but these are much less frequently used. Searching the Web for “Stubbs
>> is a cat” returns more than 10,000 hits, but not a single one for “Stubbs
>> is a tom” nor “Stubbs is a tomcat”.
>> In comparison, for Félicette
>> <https://en.wikipedia.org/wiki/F%C3%A9licette>, the first and so far
>> only cat in space, the articles indeed use the words “chatte” in
>> French and “Katze” in German.
>> Here we are talking about three rather closely related languages, we
>> are talking about a rather simple noun. This should have been a very simple
>> case, and yet it is not. When we talk about verbs, adjectives, or nouns
>> about more complex concepts (for example different kinds of human
>> settlements or the different ways human body parts are conceptualized in
>> different languages, e.g. arms and hands
>> <https://wals.info/chapter/129>, terms for colors), it gets much
>> more complicated very quickly. If we were to require that all words we want
>> to use in Abstract Wikipedia first must align their meanings, then that
>> would put a very difficult task in our critical path. So whereas it would
>> indeed have been helpful to Abstract Wikipedia to have followed an
>> onomasiological approach (how wonderful would it be to have a comprehensive
>> catalog of meanings!), that approach was deemed too difficult and a
>> semasiological approach was chosen instead.
>> Fortunately, a catalog of meanings is not necessary. The way we can
>> avoid that is because Abstract Wikipedia only needs to generate text, and
>> neither parse nor understand it. This allows us to get by using a
>> Constructor that, for each language, uses a Renderer to select the correct
>> word (or other lexical representation). For example, we could have a
>> Constructor that may take several optional further pieces of information:
>> the kind of animal, the breed, the color, whether it is an adult, whether
>> it is neutered, the gender, the number of them, etc. For each of these
>> pieces of information, we could mark whether that information must be
>> expressed in the Rendering, or whether this information is optional and can
>> be ignored, and thus what is available for those Renderers to choose the
>> most appropriate word. Note, this is not telling the community how to do
>> it, merely sketching out one possible approach that would avoid to rely on
>> a catalog of meanings.
>> Each language Renderer could then use the information it needs to
>> select the right word. If a language has a preference to express the gender
>> (such as German) it can do so, whereas a language that prefers not to (such
>> as English) can do so. If for a language the age of the cat matters for the
>> selection of the word, it can look it up. If the color of the animal
>> matters (as it does for horses in German
>> the respective Renderer can use the information. If a required information
>> is missing, we could add this to a maintenance queue so that contributors
>> can fill it out. If a language should happen not to have a word, a
>> different noun phrase can be chosen, e.g. a less specific word such as
>> ”animal” or “pet”, or a phrase such as “male kitten”, or “black
>> horse” for the German word “Rappen”.
>> But the important design feature here is that we do not need to
>> ensure and agree on the alignment of meanings of words across different
>> languages. We do not need a catalog of meanings to achieve what we want.
>> Now, there are plenty of other use cases for having such a catalog of
>> meanings. It would be a tremendously valuable resource. And even without
>> such a catalog, the statements connecting Senses and Items in Wikidata can
>> be very helpful for the creation and maintenance of Renderers, but these do
>> not need to be used when the natural text for Wikipedia is created.
>> This suggestion is not meant to be prescriptive, as said. It will be
>> up to the community to decide on how to implement the Renderers and what
>> information to use. In this, I am sketching out an architecture that allows
>> us to avoid blocking on the availability of a (valuable but very difficult
>> to create) resource, a comprehensive catalog of meanings aligning words
>> across many different languages.
>> Abstract-Wikipedia mailing list --
>> List information:
> Abstract-Wikipedia mailing list --
> List information:
Abstract-Wikipedia mailing list --
Abstract-Wikipedia mailing list --