Re: [Wikidata-l] What is the point of properties?

29 May 2014

      David,
On 28/05/14 16:35, David Cuenca wrote:
...
Markus,
Ok, now I understand that "same as" wouldn't be a good name for the
confusion it would cause. However the property "subject of" as it is now
wouldn't be a good candidate either. Its meaning is that a certain
statement is represented by another item (that is why it is only allowed
to be used as qualifier).
Ok.
...
Perhaps a better name would be "corresponds with item" and the inverse
"corresponds with property". Just by having these connections, a lot of
information can be inferred from the connected item.
Consider the following example with "occupation (P106)", and "occupation
(Q13516667)":

I cannot find any clear "subproperty of" for p106, but there is a

clear "subclass of:human behaviour" for the item

"human behaviour" is "part of" human

I don't understand this use of "part of". Maybe I would say "having an 
occupation is part of being human" but not that "occupation is part of 
human". I would not use either of these and restrict "part of" to clear, 
undisputed statements like "the steering wheel is part of the car". 
Otherwise, anything could be part of human ("head"?, "sadness"?, 
"singing"?, "birth"? -- entering this in Wikidata would not lead anywhere).
"Part of" is quite problematic in general. You can see it from the 
discussion on its property page, and also from the uses it sees in the 
wiki, that this property is severely misunderstood and/or misused. At 
the very least, one should distinguish "physical part of" from "meronym" 
(both are aliases of the property now!). And then one should realise 
that meronyms are in the domain of Wiktionary, which we cannot capture 
in Wikidata properly since we do not have items for words but for 
concepts. One alias for an item might be a meronym of something else, 
while another alias for the same item is not. Using statements for 
linguistic properties in Wikidata will not be successful. I am not 
saying that Wikibase is not able to capture some ideas of a thesaurus 
(we have actually discussed this), but this is not how it is used in 
Wikidata.
...

"human" can have a statement "intrinsic property" (property proposal

still under discussion) with values "birthday (Q47223)" and an
"(eventual) date of death". It can be expanded in the future to include
newly created properties like "height", "weight", "eye color", etc
Yes, this again makes sense to me. It is basically a variant of the 
constraint "Item" which allows you to say that items that are instance 
of human should also have a birthday. But again, this is schematic 
information (like constraints) and it should not be mixed up with actual 
data. It is the same conceptual difference that I have explained for 
properties vs. items earlier. Moreover, I think this information (even 
if correct in some sense) has very little utility as a piece of 
information about an item; it is much more useful for constraints about 
properties (which are not items).
...

birthday (Q47223) <corresponds with property> date of birth (P569)

It should be the other way around: the correspondence says something 
about P569, not about Q47223. There cannot be any reference for this. It 
should therefore be a claim on the page of P569 rather than a statement 
on the page of Q47223.
...
Out of this I reach the following conclusions:

the taxonomy of properties is going to be weak, since there is not

always a clear subpropertyOf unless created artificially (more work)
I agree.
...

the standard taxonomy of items (subclass of/part of) is sufficient

to automatically reach meaningful constraints and inference (less work)
I agree that the taxonomy will be helpful in constraints. This is what 
constraints already do when using instance of/subclass of. However, I do 
not agree that the constraints can or should be stated as part of this 
taxonomy. Constraints are too complex, and they are conceptually 
different (they say how a property should be used, not how something in 
the Real World relates to something else). Constraints interact nicely 
with the taxonomy and help to get useful conclusions, but they are not 
"part of" taxonomy ;-). We must keep content organisation separate from 
content.
...

by adding manually the constraints to the property itself we are

duplicating information which will require volunteer effort to maintain
(more work)
I disagree. Constraints refer to the property, not to the Wikidata item, 
and it would be conceptually wrong to mix these things up. We already 
have agreed that properties and items need to remain distinct for 
technical reasons. Once this is clear, there is no reason to move 
information that refers to properties (constraints) to item pages. This 
will not be a duplication of information: it is enough to have the 
constraints on the property pages only. If you look at the constraints 
we have, you can see many examples that are specific to Wikidata and 
certainly not a general thing about the concept (take the "allowed 
values" for "sex or gender"). We really want to keep editorial helpers 
(constraints) distinct from sourced information (statements about items).
...
My recommendation is to rely mainly on the main taxonomy instead of
creating a parallel property taxonomy, and then think of ways to extract
information from the main taxonomy to convert it automatically into
constraints.
All the maintenance takes effort, so the more it can be automated, the
more efficient volunteers will be. And if we can simplify the
maintenance of properties, we will be able to simplify the creation of
properties too, specially when we face the next surge which will come
with the datatype "number with units".
I agree with the general goals, but I don't think that things become any 
easier if we confuse information about properties with information about 
items. We can still re-use information we have about items (like the 
class hierarchy that we already use in constraints) to avoid 
duplication, but some things are clearly not part of the item taxonomy.
Cheers,
Markus
...
On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch
<markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>
wrote:
David,

Regarding the question of how to classify properties and how to
relate them to items:

* "same as" (in the sense of owl:sameAs) is not the right concept
here. In fact, it has often been discouraged to use this on the Web,
since it has very strong implications: it means that in all uses of
the one identifier, one could just as well use the other identifier,
and that it is indistinguishable if something has been said about
the one or the other. That seems too strong here, at least for most
cases.

* In the world of OWL DL, sameAs specifically refers to individuals,
not to classes or properties. Saying "P sameAs Q" does not imply
that P and Q have the same extension as properties. For the latter,
OWL has the relationship owl:equivalentProperties. This distinction
of instance level and schema level is similar to the distinction we
have between "instance of" and "subclass of".

* Therefore, I would suggest to use a property called "subproperty
of" as one way of relating properties (analogously to "subclass
of"). It has to be checked if this actually occurs in Wikidata (do
we have any properties that would be in this relation, or do we make
it a modelling principle to have only the most specific properties
in Wikidata?).

* The relationship from properties to items could be modelled with
the existing property "subject of" (P805).

* It might be useful to also have a taxonomic classification of
properties. For example, we already group properties into properties
for "people", "organisations", etc. Such information could also be
added with a specific property (this would be a bit more like a
"category" system on property pages). On the other hand, some of
this might coincide with constraint information that could be
expressed as claims. For instance, person properties might be those
with "Type" (i.e., "rdfs:domain") constraint human. By the way, our
constraint system could use some systematisation -- there are many
overlaps in what you can do with one constraint or another.

Cheers,

Markus

On 28/05/14 12:14, David Cuenca wrote:

    Markus,
    The explanation about the implications of renaming/deleting
    makes most
    sense and just that justifies already the separation in two.
    It is equally true that when we create a property, we might have
    "cleaned" the original concept so much that it might differ (even
    slightly) with the understood concept that the item represents.
    However,
    even after that process, the "new" concept is still an item...

    The process of imbuing a concept with permanent characteristics
    (adding
    a datatype) and the practical approach, also seems to recommend
    keeping
    items and properties separate.
    Thanks for showing me that reasoning :)

    I am still wondering about how are we going to classify properties.
    Maybe it will require a broader discussion, but if they are the
    same (or
    mostly the same) as items, then we can just link them as "same
    as", and
    build the classing structure just for the items. OTOH, if they are
    different, then we will need to mirror that classification for
    properties, which seems quite redundant. Plus adding a new datatype,
    "property".

    All in all, my conclusion about this is that properties are just
    concepts with special qualities that justify the separation in the
    software (even if in real life there is no separation).

    many thanks for your detailed answer, and sorry if I'm bringing up
    already discussed topics. It is just that when you stare long into
    wikidata, wikidata stares back into you ;)

    Cheers,
    Micru

    On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch
    <markus@semantic-mediawiki.org
    <mailto:markus@semantic-mediawiki.org>
    <mailto:markus@semantic-__mediawiki.org
    <mailto:markus@semantic-mediawiki.org>>>

    wrote:

         Hi David,

         Interesting remark. Let's explore this idea a bit. I will
    give you
         two main reasons why we have properties separate, one
    practical and
         one conceptual.

         First the practical point. Certainly, everything that is
    used as a
         property needs to have a datatype, since otherwise the wiki
    would
         not know what kind of input UI to show. So you cannot use
    just any
         item as a property straight away -- it needs to have a datatype
         first. So, yes, you could abolish the namespace Property
    but you
         still would have a clear, crisp distinction between
    property items
         (those with datatype) and normal items (those without a
    datatype).
         Because of this, most of the other functions would work the
    same as
         before (for example, property autocompletion would still
    only show
         properties, not arbitrary items).

         A complication with this approach is that property
    datatypes cannot
         change in Wikibase. This design was picked since there is
    no way to
         convert existing data from one datatype to another in
    general. So
         changing the datatype would create problems by making a lot
    of data
         "invalid", and require special handling and special UI to
    handle
         this situation. With properties living in a separate
    namespace, this
         is not a real restriction: you can just create a new
    property and
         give it the same label (after naming the old one
    differently, e.g.,
         putting "DEPRECATED" in its name). Then you can migrate the
    data in
         some custom fashion. But if properties would be items, we
    would have
         a problem here: the item is already linked to many
    Wikipedias and
         other projects, and it might be used in LUA scripts,
    queries, or
         even external applications like Denny's Javascript translation
         library. You cannot change item ids easily. Also, many
    items would
         not have a datatype, so the first one who (accidentally?)
    is entered
         will be fixed. So we would definitely need to rethink the
    whole idea
         of unchangeable datatypes.

         My other important reason is conceptual. Properties are not
         considered part of the (encyclopaedic) data but rather part
    of the
         schema that the community has picked to organise that data.
    As in
         your example, "emissivity" (Q899670) is a notion in physics as
         described in a Wikipedia article. There are many things to
    say about
         this notion (for example, it has a history: somebody must have
         defined this first -- although Wikipedia does not say it in
    this
         case). As in all cases, some statements might be disputed while
         others are widely acknowledged to be "true".

         For the property "emissivity" (P1295), the situation is quite
         different. It was introduced as an element used to enter data,
         similar to a row in a database table or an infobox template
    in some
         Wikipedia. It does probably closely relate to the actual
    physical
         notion Q899670, but it still is a different thing. For
    example, it
         was first introduced by User:Jakec, who is probably not the
    person
         who introduced the physical concept ;-) Anything that we
    will say
         about P1295 in the future refers to the property -- a
    concept of our
         own making, that is not described in any external source
    (there are
         no publications discussing P1295).

         This is also the reason why properties are supposed to support
         *claims* not *statements*. That is, they will have
    property-value
         pairs and qualifiers, but no references or ranks. Indeed,
    anything
         we say about properties has the status of a definition. If
    we say
         it, it's true. There is no other authority on Wikidata
    properties.
         You could of course still have items and properties "share"
    a page
         and somehow define which statements/claims refer to which
    concept,
         but this does not seem to make things easier for users.

         These are, for me, the two main reasons why it makes sense
    to keep
         properties apart from items on a technical level. Besides
    this, it
         is also convenient to separate the 1000-something
    properties from
         the 15-million something items for reasons of maintenance.

         Best regards,

         Markus

         On 28/05/14 09:25, David Cuenca wrote:

             Since the very beginning I have kept myself busy with
    properties,
             thinking about which ones fit, which ones are missing
    to better
             describe
             reality, how integrate into the ones that we have. The
    thing is
             that the
             more I work with them, the less difference I see with
    normal
             items....
             and if soon there will be statements allowed in
    property pages, the
             difference will blur even more.
             I can understand that from the software development
    point of view it
             might make sense to have a clear difference. Or for the
             community to get
             a deeper understanding of the underlying concepts
    represented by
             words.

             But semantically I see no difference between:
             cement (Q45190) <emissivity (P1295)> 0.54
             and
             cement (Q45190) <emissivity (Q899670)> 0.54

             Am I missing something here? Are properties really
    needed or are we
             adding unnecessary artificial constraints?

             Cheers,
             Micru

             ___________________________________________________
             Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
             <mailto:Wikidata-l@lists.__wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>>
    https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>

    <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>

         ___________________________________________________
         Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    <mailto:Wikidata-l@lists.__wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>>
    https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>

         <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>

    --
    Etiamsi omnes, ego non

    _________________________________________________
    Wikidata-l mailing list
    Wikidata-l@lists.wikimedia.org
    <mailto:Wikidata-l@lists.wikimedia.org>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

--
Etiamsi omnes, ego non

Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] What is the point of properties?