Markus,

Ok, now I understand that "same as" wouldn't be a good name for the confusion it would cause. However the property "subject of" as it is now wouldn't be a good candidate either. Its meaning is that a certain statement is represented by another item (that is why it is only allowed to be used as qualifier).

Perhaps a better name would be "corresponds with item" and the inverse "corresponds with property". Just by having these connections, a lot of information can be inferred from the connected item.

Consider the following example with "occupation (P106)", and "occupation (Q13516667)":

- I cannot find any clear "subproperty of" for p106, but there is a clear "subclass of:human behaviour" for the item

- "human behaviour" is "part of" human

- "human" can have a statement "intrinsic property" (property proposal still under discussion) with values "birthday (Q47223)" and an "(eventual) date of death". It can be expanded in the future to include newly created properties like "height", "weight", "eye color", etc

- birthday (Q47223) <corresponds with property> date of birth (P569)

Out of this I reach the following conclusions:

- the taxonomy of properties is going to be weak, since there is not always a clear subpropertyOf unless created artificially (more work)

- the standard taxonomy of items (subclass of/part of) is sufficient to automatically reach meaningful constraints and inference (less work)

- by adding manually the constraints to the property itself we are duplicating information which will require volunteer effort to maintain (more work)

My recommendation is to rely mainly on the main taxonomy instead of creating a parallel property taxonomy, and then think of ways to extract information from the main taxonomy to convert it automatically into constraints.

All the maintenance takes effort, so the more it can be automated, the more efficient volunteers will be. And if we can simplify the maintenance of properties, we will be able to simplify the creation of properties too, specially when we face the next surge which will come with the datatype "number with units".

Cheers,

Micru

On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:

David,

Regarding the question of how to classify properties and how to relate them to items:

* "same as" (in the sense of owl:sameAs) is not the right concept here. In fact, it has often been discouraged to use this on the Web, since it has very strong implications: it means that in all uses of the one identifier, one could just as well use the other identifier, and that it is indistinguishable if something has been said about the one or the other. That seems too strong here, at least for most cases.

* In the world of OWL DL, sameAs specifically refers to individuals, not to classes or properties. Saying "P sameAs Q" does not imply that P and Q have the same extension as properties. For the latter, OWL has the relationship owl:equivalentProperties. This distinction of instance level and schema level is similar to the distinction we have between "instance of" and "subclass of".

* Therefore, I would suggest to use a property called "subproperty of" as one way of relating properties (analogously to "subclass of"). It has to be checked if this actually occurs in Wikidata (do we have any properties that would be in this relation, or do we make it a modelling principle to have only the most specific properties in Wikidata?).

* The relationship from properties to items could be modelled with the existing property "subject of" (P805).

* It might be useful to also have a taxonomic classification of properties. For example, we already group properties into properties for "people", "organisations", etc. Such information could also be added with a specific property (this would be a bit more like a "category" system on property pages). On the other hand, some of this might coincide with constraint information that could be expressed as claims. For instance, person properties might be those with "Type" (i.e., "rdfs:domain") constraint human. By the way, our constraint system could use some systematisation -- there are many overlaps in what you can do with one constraint or another.

Cheers,

Markus

On 28/05/14 12:14, David Cuenca wrote:

Markus,
The explanation about the implications of renaming/deleting makes most
sense and just that justifies already the separation in two.
It is equally true that when we create a property, we might have
"cleaned" the original concept so much that it might differ (even
slightly) with the understood concept that the item represents. However,
even after that process, the "new" concept is still an item...

The process of imbuing a concept with permanent characteristics (adding
a datatype) and the practical approach, also seems to recommend keeping
items and properties separate.
Thanks for showing me that reasoning :)

I am still wondering about how are we going to classify properties.
Maybe it will require a broader discussion, but if they are the same (or
mostly the same) as items, then we can just link them as "same as", and
build the classing structure just for the items. OTOH, if they are
different, then we will need to mirror that classification for
properties, which seems quite redundant. Plus adding a new datatype,
"property".

All in all, my conclusion about this is that properties are just
concepts with special qualities that justify the separation in the
software (even if in real life there is no separation).

many thanks for your detailed answer, and sorry if I'm bringing up
already discussed topics. It is just that when you stare long into
wikidata, wikidata stares back into you ;)

Cheers,
Micru

On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch
<markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>

wrote:

Hi David,

Interesting remark. Let's explore this idea a bit. I will give you
two main reasons why we have properties separate, one practical and
one conceptual.

First the practical point. Certainly, everything that is used as a
property needs to have a datatype, since otherwise the wiki would
not know what kind of input UI to show. So you cannot use just any
item as a property straight away -- it needs to have a datatype
first. So, yes, you could abolish the namespace Property but you
still would have a clear, crisp distinction between property items
(those with datatype) and normal items (those without a datatype).
Because of this, most of the other functions would work the same as
before (for example, property autocompletion would still only show
properties, not arbitrary items).

A complication with this approach is that property datatypes cannot
change in Wikibase. This design was picked since there is no way to
convert existing data from one datatype to another in general. So
changing the datatype would create problems by making a lot of data
"invalid", and require special handling and special UI to handle
this situation. With properties living in a separate namespace, this
is not a real restriction: you can just create a new property and
give it the same label (after naming the old one differently, e.g.,
putting "DEPRECATED" in its name). Then you can migrate the data in
some custom fashion. But if properties would be items, we would have
a problem here: the item is already linked to many Wikipedias and
other projects, and it might be used in LUA scripts, queries, or
even external applications like Denny's Javascript translation
library. You cannot change item ids easily. Also, many items would
not have a datatype, so the first one who (accidentally?) is entered
will be fixed. So we would definitely need to rethink the whole idea
of unchangeable datatypes.

My other important reason is conceptual. Properties are not
considered part of the (encyclopaedic) data but rather part of the
schema that the community has picked to organise that data. As in
your example, "emissivity" (Q899670) is a notion in physics as
described in a Wikipedia article. There are many things to say about
this notion (for example, it has a history: somebody must have
defined this first -- although Wikipedia does not say it in this
case). As in all cases, some statements might be disputed while
others are widely acknowledged to be "true".

For the property "emissivity" (P1295), the situation is quite
different. It was introduced as an element used to enter data,
similar to a row in a database table or an infobox template in some
Wikipedia. It does probably closely relate to the actual physical
notion Q899670, but it still is a different thing. For example, it
was first introduced by User:Jakec, who is probably not the person
who introduced the physical concept ;-) Anything that we will say
about P1295 in the future refers to the property -- a concept of our
own making, that is not described in any external source (there are
no publications discussing P1295).

This is also the reason why properties are supposed to support
*claims* not *statements*. That is, they will have property-value
pairs and qualifiers, but no references or ranks. Indeed, anything
we say about properties has the status of a definition. If we say
it, it's true. There is no other authority on Wikidata properties.
You could of course still have items and properties "share" a page
and somehow define which statements/claims refer to which concept,
but this does not seem to make things easier for users.

These are, for me, the two main reasons why it makes sense to keep
properties apart from items on a technical level. Besides this, it
is also convenient to separate the 1000-something properties from
the 15-million something items for reasons of maintenance.

Best regards,

Markus

On 28/05/14 09:25, David Cuenca wrote:

Since the very beginning I have kept myself busy with properties,
thinking about which ones fit, which ones are missing to better
describe
reality, how integrate into the ones that we have. The thing is
that the
more I work with them, the less difference I see with normal
items....
and if soon there will be statements allowed in property pages, the
difference will blur even more.
I can understand that from the software development point of view it
might make sense to have a clear difference. Or for the
community to get
a deeper understanding of the underlying concepts represented by
words.

But semantically I see no difference between:
cement (Q45190) <emissivity (P1295)> 0.54
and
cement (Q45190) <emissivity (Q899670)> 0.54

Am I missing something here? Are properties really needed or are we
adding unnecessary artificial constraints?

Cheers,
Micru

_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l

<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

--
Etiamsi omnes, ego non

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

--
Etiamsi omnes, ego non