Hey,
I have some questions/remarks on the "Complete Datamodel in WON" section Markus wrote up yesterday: https://meta.wikimedia.org/wiki/Wikidata/Data_model#Complete_Datamodel_in_WO...
There are several things we have not modelled yet, which I'm currently not going to comment on. For those we did already implement or thought about implementing, there are a few things that do not match what's written in this section.
SiteLanguageCode
All occurrences of this should be replaced by GlobalSiteIdentifier, as it's NOT a language code.
ItemDescription := 'ItemDescription(' Item {TitleRecord}
[MultilingualTextValue] [MultilingualTextValue] {Statement} ')'
This is missing the aliases stuff, which would be { UserLanguageCode String }.
GeoCoordinatesValue := 'GeoCoordinatesValue(' decimal decimal
decimal ')'
Altitude is probably something we will not have in many cases, so I think this ought to be optional. Another optional argument would be the globe to which the coordinates belong. Different globes have different ways of measuring coordinates, so a specific set of coordinates that is valid on one might mean something else on another and simply be invalid on a third.
PropertyDescription := 'PropertyDescription(' Property {TitleRecord}
[MultilingualTextValue] [MultilingualTextValue] ')'
Right now the Property interface is very similar to the Item one, except that Item has what we're calling sitelinks (in the here discussed WON it's called TitleRecords), and property does not. That's the first difference. Copy paste error? Or am I misinterpreting the notation?
As with Item, it's missing the aliases.
Although we have not implemented this yet, the Entity interface implies that it contains a list of statements. I added this after some discussion with Denny. As a result, both Items and Properties have a list of statements. That's the third difference with the WON stuff. Here the question is if non-Item Entities should have statements or not. There is no consensus on this within the team yet, so I will start a new thread about it so we can discuss it in more detail.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
On Sun, Sep 16, 2012 at 4:33 PM, Jeroen De Dauw jeroendedauw@gmail.comwrote:
Hey,
I have some questions/remarks on the "Complete Datamodel in WON" section Markus wrote up yesterday: https://meta.wikimedia.org/wiki/Wikidata/Data_model#Complete_Datamodel_in_WO...
[snip]
GeoCoordinatesValue := 'GeoCoordinatesValue(' decimal decimal
decimal ')'
Altitude is probably something we will not have in many cases, so I think this ought to be optional. Another optional argument would be the globe to which the coordinates belong.
Agree and there should be a default coordinate system such as EPSG:4326 ( http://spatialreference.org/ref/epsg/4326/).
Different globes have different ways of measuring coordinates, so a specific set of coordinates that is valid on one might mean something else on another and simply be invalid on a third.
Agree with that.
And I think it should be GeoCoordinateValue, not GeoCoordinatesValue.
Cheers, Katie
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi,
On 16/09/12 15:33, Jeroen De Dauw wrote:
Hey,
I have some questions/remarks on the "Complete Datamodel in WON" section Markus wrote up yesterday: https://meta.wikimedia.org/wiki/Wikidata/Data_model#Complete_Datamodel_in_WO...
There are several things we have not modelled yet, which I'm currently not going to comment on. For those we did already implement or thought about implementing, there are a few things that do not match what's written in this section.
SiteLanguageCode
All occurrences of this should be replaced by GlobalSiteIdentifier, as it's NOT a language code.
Yes, I wondered about this too. I will make that change soon.
ItemDescription := 'ItemDescription(' Item {TitleRecord}
[MultilingualTextValue] [MultilingualTextValue] {Statement} ')'
This is missing the aliases stuff, which would be { UserLanguageCode String }.
As explained in the text, the aliases are not distinguished from other property values in the data model right now. This was the status of the discussion when we last talked about this, but we can also re-introduce aliases as a special field (I see why this would be useful). Daniel had an argument against this, saying that many other property values could also work as aliases in certain domains (e.g. binomial names of biological species). So the special status of the alias in the data model was questioned. But the current aliases are still special in various ways (no references, no statement ranks, special UI handling, maybe special constraints [if two items have the same description, can one of them use an alias that is the title of the other?]). Discuss. ;-)
GeoCoordinatesValue := 'GeoCoordinatesValue(' decimal decimal
decimal ')'
Altitude is probably something we will not have in many cases, so I think this ought to be optional. Another optional argument would be the globe to which the coordinates belong. Different globes have different ways of measuring coordinates, so a specific set of coordinates that is valid on one might mean something else on another and simply be invalid on a third.
Yes, this DataValue is certainly in early draft stage. I will change the name as suggested by Katie. For the content, should we have optional altitude and optional planet identifier (what is the format of this? is there an IRI for Mars?)?
PropertyDescription := 'PropertyDescription(' Property
{TitleRecord} [MultilingualTextValue] [MultilingualTextValue] ')'
Right now the Property interface is very similar to the Item one, except that Item has what we're calling sitelinks (in the here discussed WON it's called TitleRecords), and property does not. That's the first difference. Copy paste error? Or am I misinterpreting the notation?
This is also based on a preliminary decision made a while back: the idea was that properties, while not having Wikipedia articles, will still need unique string identifiers that can be used in wikitext (e.g. queries) where one does not want to address properties by ID or by "label+description" pairs.
Denny mentioned that there was some discussion to use the label for this (requiring it to be globally unique for a property), so that the TitleRecord is not needed. Will this work well? After all, the label is in languages, while the TitleRecord refers to sites (which is the place where you want to refer to properties by title). Using label instead will require a more complicated site-to-language(s) mapping somehow (example: if I write a query about an Item on en.wikipedia.org, it is clear that I refer to the title of that Item in en.wikipedia.org; if I write about a Property and properties only have labels, then it is not clear if I refer to the label in en-uk, en-us, en-ca, etc.).
As with Item, it's missing the aliases.
Same answer as above. I see now that detailed explanations do not help, since nobody reads them :-p
Although we have not implemented this yet, the Entity interface implies that it contains a list of statements. I added this after some discussion with Denny. As a result, both Items and Properties have a list of statements. That's the third difference with the WON stuff. Here the question is if non-Item Entities should have statements or not. There is no consensus on this within the team yet, so I will start a new thread about it so we can discuss it in more detail.
Yes, this is easy to change. When the data model was initially discussed, there was no use case for having statements (= ranked claims with references) in properties. It seems that a property could at best have a list of PropertyValueSnaks (no auxiliary Snaks, no references, no statement rank).
Markus
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey,
As explained in the text, the aliases are not distinguished from other
property values in the data model right now. This was the status of the discussion when we last talked about this, but we can also re-introduce aliases as a special field (I see why this would be useful). Daniel had an argument against this, saying that many other property values could also work as aliases in certain domains (e.g. binomial names of biological species). So the special status of the alias in the data model was questioned.
Right, that makes sense to implement at some point if there really is demand for this. This is rather harder to implement then what we're currently doing and is blocked by phase 2 stuff and probably phase 3 stuff, while we want to have it in phase 1 already.
A while back we also had a related discussion where Daniel took the position that we should also not have special labels and descriptions. The conclusion of that was that we will have them but that we will make them accessible via the same interface as regular properties (at least for read ops).
if two items have the same description, can one of them use an alias that
is the title of the other?
Good question. Right now this is not enforced. Then again, right now aliases are not used anywhere for lookups except in the fulltext search thing, where this restriction is not really relevant. Denny, Daniel, any thoughts on this?
This is also based on a preliminary decision made a while back: the idea
was that properties, while not having Wikipedia articles, will still need unique string identifiers that can be used in wikitext (e.g. queries) where one does not want to address properties by ID or by "label+description" pairs.
This seems odd to me - you sure the term TitleRecord is being used consistently through the data model and this thread? I'm using it as "GlobalSiteId PageName".
I do agree you would probably not want to put label and description in wikitext, and that just the label might or might not be sufficient, even if they are unique per language. If you need an id that really is always unique you can just use the p12345 thing. Since most of the editing of these will happen via GUIs (right?) this seems to be quite acceptable. Or does anybody see a better approach? In any case, why would you resort to "GlobalSiteId PageName" rather then "label description"? What makes it so odd is that the "GlobalSiteId PageName" is meant to indicate equivalence of items across sites, which is rather different then using it to identify properties in wikitext.
It seems that a property could at best have a list of PropertyValueSnaks
(no auxiliary Snaks, no references, no statement rank).
Why not have a list of claims?
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
On 17/09/12 12:54, Jeroen De Dauw wrote:
Hey,
As explained in the text, the aliases are not distinguished from other property values in the data model right now. This was the status of the discussion when we last talked about this, but we can also re-introduce aliases as a special field (I see why this would be useful). Daniel had an argument against this, saying that many other property values could also work as aliases in certain domains (e.g. binomial names of biological species). So the special status of the alias in the data model was questioned.
Right, that makes sense to implement at some point if there really is demand for this. This is rather harder to implement then what we're currently doing and is blocked by phase 2 stuff and probably phase 3 stuff, while we want to have it in phase 1 already.
A while back we also had a related discussion where Daniel took the position that we should also not have special labels and descriptions. The conclusion of that was that we will have them but that we will make them accessible via the same interface as regular properties (at least for read ops).
Ok, I agree with that. I will change the model to have explicit aliases somewhere.
if two items have the same description, can one of them use an alias
that is the title of the other?
Good question. Right now this is not enforced. Then again, right now aliases are not used anywhere for lookups except in the fulltext search thing, where this restriction is not really relevant. Denny, Daniel, any thoughts on this?
This is also based on a preliminary decision made a while back: the
idea was that properties, while not having Wikipedia articles, will still need unique string identifiers that can be used in wikitext (e.g. queries) where one does not want to address properties by ID or by "label+description" pairs.
This seems odd to me - you sure the term TitleRecord is being used consistently through the data model and this thread? I'm using it as "GlobalSiteId PageName".
Yes, this is what I mean. But PageName is just a string, and does not need to refer to an actual page (or be displayed as a link). It can still be used as a "string key" to refer to the property on a certain site.
I do agree you would probably not want to put label and description in wikitext, and that just the label might or might not be sufficient, even if they are unique per language. If you need an id that really is always unique you can just use the p12345 thing. Since most of the editing of these will happen via GUIs (right?) this seems to be quite acceptable. Or does anybody see a better approach?
Well, the above. It allows you to assign a human-readable key to each property that you can use instead of p12345 and that is still unique for each site. Moreover, this can be done with code that is similar to what we already have for site links in Items (but without linking and thus also without auto completion).
In any case, why would you resort to "GlobalSiteId PageName" rather then "label description"?
Because it is easier. First of all, "label description" is not enough: you need to say which language you talk about to make it a key (this can be guessed from the site, but this is still not a unique selection). Second, you do not need to mention the GlobalSiteId if you are on a site and want to use its own ID. So one addressing method requires one strong key (PageName), the other requires three string keys (language, label, description). The former seems easier.
What makes it so odd is that the "GlobalSiteId PageName" is meant to indicate equivalence of items across sites, which is rather different then using it to identify properties in wikitext.
What you are saying ("equivalence across sites") only is another way to say that "GlobalSiteId PageName" is a key for entities on Wikidata. Such keys can always be used to define equivalence classes (of keys that refer to the same thing); how is that a problem?
It seems that a property could at best have a list of
PropertyValueSnaks (no auxiliary Snaks, no references, no statement rank).
Why not have a list of claims?
Do you think you need auxiliary Snaks there? The step from "list of snaks" to "list of snaks with auxiliary snaks (i.e., claims)" is not hard to make (even later), but I would not make it without a cause. In general, what is the motivation of allowing arbitrary Snaks for properties? Really general annotations (users define properties, e.g., to organise other properties) or merely technical information (some properties need extra information about things like units of measurement)?
Markus
Re: keys for properties
For now the following solution seems to be the simplest:
* make labels for properties be unique for a given language
In that case they can be used as keys. Every wiki has one (and exactly one) site language. If the label is unique, a property can be addressed by languagecode + label,and the languagecode could be used per default inside a wiki. No extra keys per site would be needed, which would have otherwise been provided by the sitelink data (which would be more appropriately named sitekey instead of sitelinks in that case). The description is not used to identify properties by the machine.
This means that two different properties cannot have the same name in one language. We will need to figure out if this is a problem during usage, and if it is, change it later.
I hope this makes sense, Denny
2012/9/17 Markus Krötzsch markus@semantic-mediawiki.org:
On 17/09/12 12:54, Jeroen De Dauw wrote:
Hey,
As explained in the text, the aliases are not distinguished from other property values in the data model right now. This was the status of the discussion when we last talked about this, but we can also re-introduce aliases as a special field (I see why this would be useful). Daniel had an argument against this, saying that many other property values could also work as aliases in certain domains (e.g. binomial names of biological species). So the special status of the alias in the data model was questioned.
Right, that makes sense to implement at some point if there really is demand for this. This is rather harder to implement then what we're currently doing and is blocked by phase 2 stuff and probably phase 3 stuff, while we want to have it in phase 1 already.
A while back we also had a related discussion where Daniel took the position that we should also not have special labels and descriptions. The conclusion of that was that we will have them but that we will make them accessible via the same interface as regular properties (at least for read ops).
Ok, I agree with that. I will change the model to have explicit aliases somewhere.
if two items have the same description, can one of them use an alias
that is the title of the other?
Good question. Right now this is not enforced. Then again, right now aliases are not used anywhere for lookups except in the fulltext search thing, where this restriction is not really relevant. Denny, Daniel, any thoughts on this?
This is also based on a preliminary decision made a while back: the
idea was that properties, while not having Wikipedia articles, will still need unique string identifiers that can be used in wikitext (e.g. queries) where one does not want to address properties by ID or by "label+description" pairs.
This seems odd to me - you sure the term TitleRecord is being used consistently through the data model and this thread? I'm using it as "GlobalSiteId PageName".
Yes, this is what I mean. But PageName is just a string, and does not need to refer to an actual page (or be displayed as a link). It can still be used as a "string key" to refer to the property on a certain site.
I do agree you would probably not want to put label and description in wikitext, and that just the label might or might not be sufficient, even if they are unique per language. If you need an id that really is always unique you can just use the p12345 thing. Since most of the editing of these will happen via GUIs (right?) this seems to be quite acceptable. Or does anybody see a better approach?
Well, the above. It allows you to assign a human-readable key to each property that you can use instead of p12345 and that is still unique for each site. Moreover, this can be done with code that is similar to what we already have for site links in Items (but without linking and thus also without auto completion).
In any case, why would you resort to "GlobalSiteId PageName" rather then "label description"?
Because it is easier. First of all, "label description" is not enough: you need to say which language you talk about to make it a key (this can be guessed from the site, but this is still not a unique selection). Second, you do not need to mention the GlobalSiteId if you are on a site and want to use its own ID. So one addressing method requires one strong key (PageName), the other requires three string keys (language, label, description). The former seems easier.
What makes it so odd is that the "GlobalSiteId PageName" is meant to indicate equivalence of items across sites, which is rather different then using it to identify properties in wikitext.
What you are saying ("equivalence across sites") only is another way to say that "GlobalSiteId PageName" is a key for entities on Wikidata. Such keys can always be used to define equivalence classes (of keys that refer to the same thing); how is that a problem?
It seems that a property could at best have a list of
PropertyValueSnaks (no auxiliary Snaks, no references, no statement rank).
Why not have a list of claims?
Do you think you need auxiliary Snaks there? The step from "list of snaks" to "list of snaks with auxiliary snaks (i.e., claims)" is not hard to make (even later), but I would not make it without a cause. In general, what is the motivation of allowing arbitrary Snaks for properties? Really general annotations (users define properties, e.g., to organise other properties) or merely technical information (some properties need extra information about things like units of measurement)?
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 17/09/12 14:37, Denny Vrandečić wrote:
Re: keys for properties
For now the following solution seems to be the simplest:
- make labels for properties be unique for a given language
In that case they can be used as keys. Every wiki has one (and exactly one) site language. If the label is unique, a property can be addressed by languagecode + label,and the languagecode could be used per default inside a wiki. No extra keys per site would be needed, which would have otherwise been provided by the sitelink data (which would be more appropriately named sitekey instead of sitelinks in that case). The description is not used to identify properties by the machine.
This means that two different properties cannot have the same name in one language. We will need to figure out if this is a problem during usage, and if it is, change it later.
I hope this makes sense,
Yes, this should be feasible. It means that properties and items use different addressing schemes, but it simplifies the data stored for properties overall. So in terms of cognitive complexity it probably evens out. I will update the data model to reflect this in due course.
Markus
2012/9/17 Markus Krötzsch markus@semantic-mediawiki.org:
On 17/09/12 12:54, Jeroen De Dauw wrote:
Hey,
As explained in the text, the aliases are not distinguished from other property values in the data model right now. This was the status of the discussion when we last talked about this, but we can also re-introduce aliases as a special field (I see why this would be useful). Daniel had an argument against this, saying that many other property values could also work as aliases in certain domains (e.g. binomial names of biological species). So the special status of the alias in the data model was questioned.
Right, that makes sense to implement at some point if there really is demand for this. This is rather harder to implement then what we're currently doing and is blocked by phase 2 stuff and probably phase 3 stuff, while we want to have it in phase 1 already.
A while back we also had a related discussion where Daniel took the position that we should also not have special labels and descriptions. The conclusion of that was that we will have them but that we will make them accessible via the same interface as regular properties (at least for read ops).
Ok, I agree with that. I will change the model to have explicit aliases somewhere.
if two items have the same description, can one of them use an alias
that is the title of the other?
Good question. Right now this is not enforced. Then again, right now aliases are not used anywhere for lookups except in the fulltext search thing, where this restriction is not really relevant. Denny, Daniel, any thoughts on this?
This is also based on a preliminary decision made a while back: the
idea was that properties, while not having Wikipedia articles, will still need unique string identifiers that can be used in wikitext (e.g. queries) where one does not want to address properties by ID or by "label+description" pairs.
This seems odd to me - you sure the term TitleRecord is being used consistently through the data model and this thread? I'm using it as "GlobalSiteId PageName".
Yes, this is what I mean. But PageName is just a string, and does not need to refer to an actual page (or be displayed as a link). It can still be used as a "string key" to refer to the property on a certain site.
I do agree you would probably not want to put label and description in wikitext, and that just the label might or might not be sufficient, even if they are unique per language. If you need an id that really is always unique you can just use the p12345 thing. Since most of the editing of these will happen via GUIs (right?) this seems to be quite acceptable. Or does anybody see a better approach?
Well, the above. It allows you to assign a human-readable key to each property that you can use instead of p12345 and that is still unique for each site. Moreover, this can be done with code that is similar to what we already have for site links in Items (but without linking and thus also without auto completion).
In any case, why would you resort to "GlobalSiteId PageName" rather then "label description"?
Because it is easier. First of all, "label description" is not enough: you need to say which language you talk about to make it a key (this can be guessed from the site, but this is still not a unique selection). Second, you do not need to mention the GlobalSiteId if you are on a site and want to use its own ID. So one addressing method requires one strong key (PageName), the other requires three string keys (language, label, description). The former seems easier.
What makes it so odd is that the "GlobalSiteId PageName" is meant to indicate equivalence of items across sites, which is rather different then using it to identify properties in wikitext.
What you are saying ("equivalence across sites") only is another way to say that "GlobalSiteId PageName" is a key for entities on Wikidata. Such keys can always be used to define equivalence classes (of keys that refer to the same thing); how is that a problem?
It seems that a property could at best have a list of
PropertyValueSnaks (no auxiliary Snaks, no references, no statement rank).
Why not have a list of claims?
Do you think you need auxiliary Snaks there? The step from "list of snaks" to "list of snaks with auxiliary snaks (i.e., claims)" is not hard to make (even later), but I would not make it without a cause. In general, what is the motivation of allowing arbitrary Snaks for properties? Really general annotations (users define properties, e.g., to organise other properties) or merely technical information (some properties need extra information about things like units of measurement)?
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
This means that two different properties cannot have the same name in one language. We will need to figure out if this is a problem during usage, and if it is, change it later.
Which means that property labels will need disambiguation additions to differentiate. I think this is correct and intuitive in a Wikimedia context.
User interface memo: There will be a need both for specific properties and generic. When accessing a property by a broad = not disambiguated name, the userinterface should automatically show its subproperties (property/subproperty hierarchy), micking essentially a disambiguation page.
I think this would nicely and intuitively fall into place with Wikipedia practices.
Gregor
2012/9/17 Jeroen De Dauw jeroendedauw@gmail.com:
if two items have the same description, can one of them use an alias that is the title of the other?
Good question. Right now this is not enforced. Then again, right now aliases are not used anywhere for lookups except in the fulltext search thing, where this restriction is not really relevant. Denny, Daniel, any thoughts on this?
It technically can happen that one of two items with the same description but different labels can have the alias of the other item.
If the item selection widget would not display the label, this could lead to confusion.
I always assumed that if an alias is used for lookup, the canonical label would still be displayed (probably additionally to the alias in that case). Thus the confusion cannot happen.
So no constraints on the aliases will be introduced for now. They are not needed.
Cheers, Denny
On Mon, Sep 17, 2012 at 12:29 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi,
On 16/09/12 15:33, Jeroen De Dauw wrote:
GeoCoordinatesValue := 'GeoCoordinatesValue(' decimal decimal
decimal ')'
Altitude is probably something we will not have in many cases, so I think this ought to be optional. Another optional argument would be the globe to which the coordinates belong. Different globes have different ways of measuring coordinates, so a specific set of coordinates that is valid on one might mean something else on another and simply be invalid on a third.
Yes, this DataValue is certainly in early draft stage. I will change the name as suggested by Katie. For the content, should we have optional altitude and optional planet identifier (what is the format of this? is there an IRI for Mars?)?
Wikipedia uses a globe parameter and supports only a few other celestial bodies:
http://en.wikipedia.org/wiki/Template:Coord#globe:G
For Earth, by default, the WGS84 (EPSG:4326) is used.
With a globe parameter, one of these is used:
http://planetarynames.wr.usgs.gov/TargetCoordinates
There also are celestial coordinates, which we may want to support:
http://en.wikipedia.org/wiki/Template:Sky
Cheers, Katie
Markus
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
______________________________**_________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l
______________________________**_________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l