Re: [Wikidata-tech] Fwd: Two questions about Lexeme Modeling - Wikidata-tech

21 Nov 2016

      Dear Denny, Daniel,
thanks for your question. I try to answer.
ad 1) "ask somebody about" and "ask somebody to" are two different 
syntactic and semantic frames.
Please look at the final spec of the lemon model:
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification#Syntacti...
In particular, check example: synsem/example7
There you see two different syntactic frames for the word "give". In 
this case they both represent the same sense corresponding to an 
exchange of goods but with different syntactic construcitons.
In your case for "ask" there would be also two syntactic frames, but two 
senses instead of one.
If you want I can send you a modelled example.
2) Such spelling variants are modelled in lemon as two different 
representations of the same lexical entry.
See ontolex/example3 in the above mentioned spec. After all, it is the 
same word with the same meanings and same pronunciation but just with a 
different spelling for each dialect of English.
In our understanding these are not two different forms as you mention, 
but two different spellings of the same form.
A form represents a particular grammatical variant, not a spelling 
variant. In this case it is the singular form of the noun. But both 
spellings really represent the same (grammatical) form, that is the 
singular form of the noun.
You do not need to specify one main written representation for each 
form, as both are valid depending on the context.
The preference for showing e.g. the American or English variant should 
be stated by the application that uses the lexicon.
Does this help?
Philipp
Am 11.11.16 um 20:07 schrieb Denny Vrandečić:
...
The Wikidata Lexeme model is basically based on Lemon, so I wanted to 
ask you whether you have answers for the following questions in Lemon?
Feel free to answer directly to the list:
https://lists.wikimedia.org/pipermail/wikidata-tech/2016-November/001057.htm...
Cheers,
Denny
---------- Forwarded message ---------
From: Daniel Kinzler <daniel.kinzler@wikimedia.de 
mailto:daniel.kinzler@wikimedia.de>
Date: Fri, Nov 11, 2016 at 9:03 AM
Subject: [Wikidata-tech] Two questions about Lexeme Modeling
To: wikidata-tech <wikidata-tech@lists.wikimedia.org 
mailto:wikidata-tech@lists.wikimedia.org>
Hi all!
There is two questions about modelling lexemes that are bothering me. 
One is an
old question, and one I only came across recently.

The question that came up for me recently is how we model the

grammatical
context for senses. For instance, "to ask" can mean requesting 
information, or
requesting action, depending on whether we use "ask somebody about" or 
"ask
somebody to". Similarly, "to shit" has entirely different meanings 
when used
reflexively ("I shit myself").
There is no good place for this in our current model. The information 
could be
placed in a statement on the word Sense, but that would be kind of 
non-obvious,
and would not (at least not easily) allow for a concise rendering, in 
the way we
see it in most dictionaries ("to ask sbdy to do sthg"). The 
alternative would be
to treat each usage with a different grammatical context as a separate 
Lexeme (a
verb phrase Lexeme), so "to shit oneself" would be a separate lemma. 
That could
lead to a fragmentation of the content in a way that is quite 
unexpected to
people used to traditional dictionaries.
We could also add this information as a special field in the Sense 
entity, but I
don't even know what that field should contain, exactly.
Got a better idea?

The older question is how we handle different renderings

(spellings, scripts)
of the same lexeme. In English we have "color" vs "colour", in German 
we have
"stop" vs "stopp" and "Maße" vs "Masse". In Serbian, we have a Roman and
Cyrillic rendering for every word. We can treat these as separate 
Lexemes, but
that would mean duplicating all information about them. We could have 
a single
Lemma, and represent the others as alternative Forms, or using 
statements on the
Lexeme. But that raises the question which spelling or script should 
be the
"main" one, and used in the lemma.
I would prefer to have multi-variant lemmas. They would work like the
multi-lingual labels we have now on items, but restricted to the 
variants of a
single language. For display, we would apply a similar language fallback
mechanism we now apply when showing labels.
2b) if we treat lemmas as multi-variant, should Forms also be 
multi-variant, or
should they be per-variant? Should the glosse of a Sense be 
multi-variant? I
currently tend towards "yes" for all of the above.
What do you think?
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org 
mailto:Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
-- 
--
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Exzellenzcluster für Cognitive Interaction Technology (CITEC)
Universität Bielefeld

Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano@cit-ec.uni-bielefeld.de

Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany