David, I am not familiar with Wiktionary and its datamodel. But your summary looks like SKOS [1] would be a good fit. Also for your proposal to extend the Wikidata datamodel. In short, SKOS distinguishes between concepts (they carry the semantics ~ Q item) and labels (they are, well, just labels). Concepts and labels are connected via a handful of properties, e.g. skos:prefLabel or skos:altLabel. In ordinary SKOS labels are simple strings but in SKOS-XL (also part of the spec) they are objects (and thus can have properties and relations to other labels (or anything) etc.).
Furthermore, SKOS is extensible, i.e. it is based on RDF and one can define subclasses of skos:concept and skos-xl:label and one can define subproperties of skos:prefLabel and skos:altLabel with particular semantics, which might be relevant for Wikidata. With this some label-like wikidata-properties could be defined as subproperties of, say, skos:altLabel to have them show up in pick lists etc.
just my 2 cents, michael
[1] The spec: http://www.w3.org/TR/skos-reference/ The primer: http://www.w3.org/TR/2009/NOTE-skos-primer-20090818
On 06.06.2014 14:00, wikidata-l-request@lists.wikimedia.org wrote:
Message: 3 Date: Thu, 5 Jun 2014 16:28:30 +0200 From: David Cuencadacuetu@gmail.com To: "Discussion list for the Wikidata project." wikidata-l@lists.wikimedia.org Subject: [Wikidata-l] What is the point of labels? Message-ID: CAJBSGSoO60AsQbUFkmefqvpE_miwFYxO2vs8jSeq0p0D82JChg@mail.gmail.com Content-Type: text/plain; charset="utf-8"
When I drafted the functional structure that is appearing on items [1], Gerard pointed out that it is drifting into the lexical area. That made me think that while useful to have lexical data as an independent item as we discussed last year for Wiktionary, the current structure "q item <label> string" doesn't seem to be compatible with that wish, or at least it would be more difficult to maintain the same label twice. And it is not just one label per item, there are many, and each one might have different lexical properties.
For more efficiency, it seems that we would need statements like "q item <label> lexical item" to reflect that separation, but that adds further complexity, because according to the latest Wikidata:Wiktionary proposal [2], the "lexical item" (W) also contains senses/meanings (S). This is recurrent, as we already have Q items as the basis for meaning... or at least a concept that is more or less shared among languages. The only difference between "Q items" and the proposed "S items" is that S items represent only one of the lexeme meanings for one particular language, but other than that they have the same nature as Q items (it should be possible to add "subclass of" and other statements to them).
Labels, aliases, and name properties are just normal statements where one of them is preferred, I have been wondering why don't we treat them as such... That way we could have some coherence, and have both "Q items" and "S items" as the units of meaning/sense and later on move the labels (lexemes), which now are strings, to the lexical items ("W items" in the example on the page Wikidata:Wiktionary).
Summing up, labels in their current form make complete sense now, but when considered together with lexical information, it seems that it would be convenient to treat all of them as statements that later on could link with "W items". And as Joe pointed out, there are many more properties that are equivalent to a label, just more specific, and that now don't show up in the suggester, nor up above of the page where they should.
I know that Wiktionary is still in the future and that there are many other priorities on the way, however since the representation of the items is being re-considered, I think it is a good moment to think about how to move little by little in the right direction. I also would like to point out that by keeping lexical information in wikidata, its complexity is going to increase inevitably. If new users already struggling to understand it now, I cannot imagine how will they cope with added elements...
Micru
[1]http://lists.wikimedia.org/pipermail/wikidata-l/2014-June/003941.html [2]https://www.wikidata.org/wiki/Wikidata:Wiktionary
Am 06.06.2014 14:39, schrieb Michael Erdmann:
David, I am not familiar with Wiktionary and its datamodel. But your summary looks like SKOS [1] would be a good fit.
SKOS is a good fit for Wikidata data items. For modeling Wiktionary, LEMON fits a lot better http://lemon-model.net/.
If you look at our RDF mapping (e.g. https://www.wikidata.org/entity/Q64.ttl), we do already use skos:prefLabel and skos:altLabel.
(Note: The RDF mapping does not yet include claims at all, it's currently only for labels, descriptions, aliases, and site links).
-- daniel
Hoi, For me referring to SKOS or whatever feels like an elaborate hoax laying the blame elsewhere. Labels are what items are knows by. They exist to identify. They exist in many languages and as we identify items by labels in many languages there has to be a way to deal with the differences between those labels.
The decision not to have qualifiers to labels is based on what was called "easy identification". Identification is not easy nor obvious. Thanks, GerardM
On 6 June 2014 15:08, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 14:39, schrieb Michael Erdmann:
David, I am not familiar with Wiktionary and its datamodel. But your summary
looks like
SKOS [1] would be a good fit.
SKOS is a good fit for Wikidata data items. For modeling Wiktionary, LEMON fits a lot better http://lemon-model.net/.
If you look at our RDF mapping (e.g. < https://www.wikidata.org/entity/Q64.ttl%3E), we do already use skos:prefLabel and skos:altLabel.
(Note: The RDF mapping does not yet include claims at all, it's currently only for labels, descriptions, aliases, and site links).
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 06.06.2014 15:16, schrieb Gerard Meijssen:
Hoi, For me referring to SKOS or whatever feels like an elaborate hoax laying the blame elsewhere. Labels are what items are knows by. They exist to identify.
SKOS is just a standard vocubulary to express the relationship or "simple" labels have with the concepts they identify. It allows that relationship to be re-used automatically in a context different from wikidata itself.
That's it.
-- daniel
Hoi, Does SKOS deal with multiple languages and the need to ensure trust all these labels mean the same? Thanks, GerardM
On 6 June 2014 15:21, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 15:16, schrieb Gerard Meijssen:
Hoi, For me referring to SKOS or whatever feels like an elaborate hoax laying
the
blame elsewhere. Labels are what items are knows by. They exist to
identify.
SKOS is just a standard vocubulary to express the relationship or "simple" labels have with the concepts they identify. It allows that relationship to be re-used automatically in a context different from wikidata itself.
That's it.
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 06.06.2014 15:24, schrieb Gerard Meijssen:
Hoi, Does SKOS deal with multiple languages and the need to ensure trust all these labels mean the same?
Yes. http://www.w3.org/TR/skos-primer/
Well, it *requires* that the labels in all the languages mean the same. How you ensure that is up to you.
Note that *all* string values in RDF are multilingual, always. SKOS does not need to provide a new mechanism for that.
-- daniel
Hoi, That is so obviously plain wrong when you want to apply SKOS to Wikidata. When SKOS has this requirement it is useless in the Wikidata context. Thanks, GerardM
On 6 June 2014 15:30, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 15:24, schrieb Gerard Meijssen:
Hoi, Does SKOS deal with multiple languages and the need to ensure trust all
these
labels mean the same?
Yes. http://www.w3.org/TR/skos-primer/
Well, it *requires* that the labels in all the languages mean the same. How you ensure that is up to you.
Note that *all* string values in RDF are multilingual, always. SKOS does not need to provide a new mechanism for that.
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 06.06.2014 15:33, schrieb Gerard Meijssen:
Hoi, That is so obviously plain wrong when you want to apply SKOS to Wikidata. When SKOS has this requirement it is useless in the Wikidata context.
Oh? Any why would that be?
We have stuff like de:Fuh -->(means)--> Q12345 eb:Foo -->(means)--> Q12345
"Using SKOS" just says that -->(means)--> can be written as "skos:prefLabel" (resp "skos:altLabel").
The requirement is that the labels in different languages refer to the same concept. It does not mean they have the same connotations, or no other possibly diverging meanings.
-- daniel
Hoi, That is exactly the point. Once you assume that they are the same you ignore the extend to which they are not. Many, many items have articles pointing to items resulting in labels that are not exactly the same subject. Thanks, GerardM
On 6 June 2014 15:38, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 15:33, schrieb Gerard Meijssen:
Hoi, That is so obviously plain wrong when you want to apply SKOS to
Wikidata. When
SKOS has this requirement it is useless in the Wikidata context.
Oh? Any why would that be?
We have stuff like de:Fuh -->(means)--> Q12345 eb:Foo -->(means)--> Q12345
"Using SKOS" just says that -->(means)--> can be written as "skos:prefLabel" (resp "skos:altLabel").
The requirement is that the labels in different languages refer to the same concept. It does not mean they have the same connotations, or no other possibly diverging meanings.
-- daniel
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 06.06.2014 15:44, schrieb Gerard Meijssen:
Hoi, That is exactly the point. Once you assume that they are the same you ignore the extend to which they are not. Many, many items have articles pointing to items resulting in labels that are not exactly the same subject.
And these are mistakes that should be fixed. So?
Hoi, In a different conversation it was put like this: "Wikipedia is what it is and Wikidata is what it is". This was in the context of assumptions. Thanks, GerardM
On 6 June 2014 16:59, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 15:44, schrieb Gerard Meijssen:
Hoi, That is exactly the point. Once you assume that they are the same you
ignore the
extend to which they are not. Many, many items have articles pointing to
items
resulting in labels that are not exactly the same subject.
And these are mistakes that should be fixed. So?
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
There's another road to ontology of labels which is connected with the kind of roles that labels play in systems.
One need is that a system wants to mention something or draw something and otherwise refer to something and it needs to know what to call it. Another need is that you have a phrase and you want to find things with a matching label. Then there's the more general problem that the user has something in his head and you want to specify it.
In terms of acceptance of labels you want the system to accept a wide range of possible names people would use for something (I think in Wikidata scope) but to make the most of that you need a good estimator of the probability that a particular surface form used in a particular context refers to this or that and that is probably out of scope.
You want to accept labels you wouldn't want to generate. A tendency to generate ethnic, racial and other kinds of slurs is a showstopper for any public commercial application. A.I.'s are like people; some of them are more prone to potty mouth than others, you can't count on good behavior unless you train your animals. Thus, offensive labels should be tagged.
Similar choices appear in different contexts. I live in New York and if you look at legal documents they always say "New York State" or "New York City" but if you drive onto the Thruway from Pennsylvania you will see "Welcome to New York" and then a distance sign that says New York is 490 miles away. Sometimes you want the latin name of an organism and sometimes you want the common name. You might want to speak of pharmaceuticals always using the generic name (Omeprazole) rather than a brand (Prilosec). Sometimes you want to use abbreviations (RDF) and other times you want to spell things out (Resource Description Framework). If you want to make something visually tight you need to control label length
http://carpictures.cc/cars/photo/
A superhuman system would certainly contain statistical models, but a lot of the knowledge needed to do the above could be encoded as properties of the labels.
ᐧ
On Fri, Jun 6, 2014 at 1:57 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, In a different conversation it was put like this: "Wikipedia is what it is and Wikidata is what it is". This was in the context of assumptions. Thanks, GerardM
On 6 June 2014 16:59, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 06.06.2014 15:44, schrieb Gerard Meijssen:
Hoi, That is exactly the point. Once you assume that they are the same you ignore the extend to which they are not. Many, many items have articles pointing to items resulting in labels that are not exactly the same subject.
And these are mistakes that should be fixed. So?
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hoi, Exactly. Thanks, GerardM
On 6 June 2014 20:57, Paul Houle ontology2@gmail.com wrote:
There's another road to ontology of labels which is connected with the kind of roles that labels play in systems.
One need is that a system wants to mention something or draw something and otherwise refer to something and it needs to know what to call it. Another need is that you have a phrase and you want to find things with a matching label. Then there's the more general problem that the user has something in his head and you want to specify it.
In terms of acceptance of labels you want the system to accept a wide range of possible names people would use for something (I think in Wikidata scope) but to make the most of that you need a good estimator of the probability that a particular surface form used in a particular context refers to this or that and that is probably out of scope.
You want to accept labels you wouldn't want to generate. A tendency to generate ethnic, racial and other kinds of slurs is a showstopper for any public commercial application. A.I.'s are like people; some of them are more prone to potty mouth than others, you can't count on good behavior unless you train your animals. Thus, offensive labels should be tagged.
Similar choices appear in different contexts. I live in New York and if you look at legal documents they always say "New York State" or "New York City" but if you drive onto the Thruway from Pennsylvania you will see "Welcome to New York" and then a distance sign that says New York is 490 miles away. Sometimes you want the latin name of an organism and sometimes you want the common name. You might want to speak of pharmaceuticals always using the generic name (Omeprazole) rather than a brand (Prilosec). Sometimes you want to use abbreviations (RDF) and other times you want to spell things out (Resource Description Framework). If you want to make something visually tight you need to control label length
http://carpictures.cc/cars/photo/
A superhuman system would certainly contain statistical models, but a lot of the knowledge needed to do the above could be encoded as properties of the labels.
ᐧ
On Fri, Jun 6, 2014 at 1:57 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, In a different conversation it was put like this: "Wikipedia is what it
is
and Wikidata is what it is". This was in the context of assumptions. Thanks, GerardM
On 6 June 2014 16:59, Daniel Kinzler daniel.kinzler@wikimedia.de
wrote:
Am 06.06.2014 15:44, schrieb Gerard Meijssen:
Hoi, That is exactly the point. Once you assume that they are the same you ignore the extend to which they are not. Many, many items have articles pointing
to
items resulting in labels that are not exactly the same subject.
And these are mistakes that should be fixed. So?
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Fri, Jun 6, 2014 at 3:08 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
SKOS is a good fit for Wikidata data items. For modeling Wiktionary, LEMON fits a lot better http://lemon-model.net/.
Could you please elaborate on how to share the label between q items and the future lexical items? It is not very clear to me what you have in mind.
OTOH, is there any possibility to have some property values indexed as aliases? (like the ones Joe mentioned [1]: "birth name", "pseudonym")
Thanks Micru
[1] http://lists.wikimedia.org/pipermail/wikidata-l/2014-June/003944.html
Am 06.06.2014 15:42, schrieb David Cuenca:
On Fri, Jun 6, 2014 at 3:08 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de mailto:daniel.kinzler@wikimedia.de> wrote:
SKOS is a good fit for Wikidata data items. For modeling Wiktionary, LEMON fits a lot better <http://lemon-model.net/>.
Could you please elaborate on how to share the label between q items and the future lexical items? It is not very clear to me what you have in mind.
That's unrelated to SKOS vs LEMON - I'm just saying that SKOS is good for the thesaurus-like info we have in the form of item labels and aliases, while we will need a more complex model like LEMON for actually modeling lexical entities in detail.
As for cross-linking: I have some ideas, but there is nothing definite yet. Basically, if sense S2 of lexical entity W5 "refers to" item Q7, the primary forms of W5 (the lemma) could be "somehow" treated as alases for Q7. How exactly I'm not sure yet. One way would be to just use a bot (see below).
OTOH, is there any possibility to have some property values indexed as aliases? (like the ones Joe mentioned [1]: "birth name", "pseudonym")
That would be nice, but we decided against it for now, in the name of simplicity. Generally, the wiki way is "no magic, do it by hand" - where "by hand" often means "by bot".
The software shouldn't know about "special" properties. A bot could know about them, and automatically add aliases.
-- daniel
On Fri, Jun 6, 2014 at 5:05 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
The software shouldn't know about "special" properties. A bot could know about them, and automatically add aliases.
Labels and aliases are acting as "special" properties, just that we cannot decide which property to use instead of the default. Now we have to enter twice the property, for instance "title" and again as label "title", same for names, same for official names of towns, same for everything... and even more so when the monolingual datatype is available. Yes, a bot could do that, but perhaps we should ask ourselves if to have all information twice makes sense?
Micru
Remember that we are also recording the wikipedia article names in multiple languages, which may or may not be the same as the label in those languages
On Fri, Jun 6, 2014 at 4:32 PM, David Cuenca dacuetu@gmail.com wrote:
On Fri, Jun 6, 2014 at 5:05 PM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
The software shouldn't know about "special" properties. A bot could know about them, and automatically add aliases.
Labels and aliases are acting as "special" properties, just that we cannot decide which property to use instead of the default. Now we have to enter twice the property, for instance "title" and again as label "title", same for names, same for official names of towns, same for everything... and even more so when the monolingual datatype is available. Yes, a bot could do that, but perhaps we should ask ourselves if to have all information twice makes sense?
Micru
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Fri, Jun 6, 2014 at 3:42 PM, David Cuenca dacuetu@gmail.com wrote:
Could you please elaborate on how to share the label between q items and the future lexical items? It is not very clear to me what you have in mind.
It's really too early to make all these decisions. We're still at least a year away from any significant progress towards Wiktionary unless anything changes. Until then we will have learned a lot more and a lot of things will have changed. Let's focus on the next big things and not lose focus: queries and Commons support :)
Cheers Lydia
On Fri, Jun 6, 2014 at 7:08 PM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
It's really too early to make all these decisions. We're still at least a year away from any significant progress towards Wiktionary unless anything changes. Until then we will have learned a lot more and a lot of things will have changed. Let's focus on the next big things and not lose focus: queries and Commons support :)
Ah, don't worry Daniel's answer regarding future W item connection was enough concerning Wiktionary. I also want to see queries and Commons support soon :-)
My main motivation to start this discussion was linked with the planned UI revamp. That is what the user sees and it conveys meaning, and not only that, users learn it and get attached to it. If there is any change to be made to labels/aliases it would be better to do it before, and to plan the mockups with those changes in mind.
I remember that you mentioned that external identifier statements should appear in another area of the screen? I don't know which mechanism would be used to accomplish that, but maybe it would help if the same method is applied to properties that represent a label? Just giving ideas, I have no preference, actually everything would be better than having to maintain (repeated) aliases by bot :)
Micru