David,
One of the uses is: what is the relationship between a human and his behavior?
This is an easy question once you have been clear about what "human behaviour" is. According to enwiki, it is a "range of behaviours *exhibited by* humans". The bigger question for me is, whether it is useful to record this relationship ("exhibited by") in Wikidata. What would anybody do with this data? In what application could it be of interest?
Moreover, as a great Icelandic ontologist once said: "There is definitely, definitely, definitely no logic, to human behaviour" ;-)
On that regard, I hate the word "constraint", because it means that we are placing a "straitjacket" on reality, when it is the other way round, recurring patterns in the real world make us "expect" that a value will fall within the bonds of our expectations.
I think "constraints" are already understood in this way. The name comes from databases, where a "constraint violation" is indeed a rather hard error. On the other hand, ironically, constraints (as a technical term) are often considered to be a softer form of modelling than (onto)logical axioms: a constraint can be violated while a logical axiom (as the name suggests) is always true -- if it is not backed by the given data, new data will be inferred. So as a technical term, "constraint" is quite appropriate for the mechanism we have, although it may not be the best term to clarify the intention.
However, I would like to go to bring the conversation to a deeper level.
...
With all this I want to make the point that there are two sources of expectations:
- from our experience seeing repetitions and patterns in the values
(male/female/etc "between 10 and 50"), which belong to the property
- from the agreed definition of the concept itself, which belong to the data
Yes. I agree with this as a basic dichotomy of things we may want to record in Wikidata. Some things are true by definition, while others are just "very likely" by observation. The exact population of Paris we will never know, but we are completely sure that a piano is an instrument. (Maybe somebody with a better philosophical background than me could give a better perspective of these notions -- "analytical" vs. "empirical" come to mind, but I am sure there is more.)
Some important ideas like classification (instance of/subclass of) belong completely to the analytical realm. We don't observe classes, we define them. A planet is what we call a "planet", and this can change even if the actual lumps in space are pretty much the same.
However, there is yet a deeper level here (you asked for it ;-). Wikidata is not about facts but about statements with references. We do not record "Pluto was a planet until 2006" but "Pluto was a planet until 2006 *according to the IAU*". Likewise, we don't say "Berlin has 3 million inhabitants" but "Berlin has 3 million inhabitants *according to the Amt fuer Statistik Berlin-Brandenburg*". If you compare these two statements, you can see that they are both "empirical", based on our observation of a particular reference. We do not have analytical knowledge of what the IAU or the Amt fuer Statistic might say. So in this sense constraints can only ever be rough guidelines. It does not make logical sense to say "if source A says X then source B must say Y" -- even if we know that X implies Y (maybe by definition), we don't know what sources A and B say. All we can do with constraints it to uncover possible contradictions between sources, which might then be looked into.
Now inferences are slightly different. If we know that X implies Y, then if "A says X" we can infer that (implicitly) "A says Y". That is a logical relationship (or rule) on the level of what is claimed, rather than on the level of statements. Note that we still need to have a way to find out that "X implies Y", which is a content-level claim that should have its own reference somewhere. We mainly use inference in this sense with "subclass of" in reasonator or when checking constraints. In this case, the implications are encoded as subclass-of statements ("If X is a piano, then X is an instrument"). This allows us to have references on the implications.
In general, an interesting question here is what the status of "subclass of" really is. Do we gather this information from external sources (surely there must be a book that tells us that pianos are instruments) or do we as a community define this for Wikidata (surely, the overall hierarchy we get is hardly the "universal class hierarchy of the world" but a very specific classification that is different from other classifications that may exist elsewhere)? Best not to think about it too much and to gather sources whenever we have them ;-)
Besides these two notions ("constraints" to uncover inconsistent references, and "logical axioms" to derive new statements from given ones), there is also a third type of constraint that is purely analytical. If we *define* that our property "birthdate" can only be used on humans (just for the example), then we know that, by our own definition/requirement, any item that has a birthdate must be a human. This is independent of whether some reference says "IBM was born on June 16, 1911" -- we would simply not translate this as "birthdate" in our encoding in Wikidata. So it is possible to have purely analytical knowledge on this level, and we will have complete control over defining it (since it's our choice what we mean by property "birthdate"). In this case, we will get hard constraints that should really never be violated (you mentioned "subject of" should only be used as qualifier -- this is another example of a hard constraint that comes from our own community definitions).
At the moment, hard constraints (from definitions) and soft constraints (expectations) are simply mixed, and maybe this is fine since we handle them in a similar fashion (humans need to look how to fix the situation). Most constraints, even those that refer to definitions, are rather soft anyway since we apply them to statements, not to hard facts. Hard constraints can only occur in cases where the *encoding* of a statement in Wikidata is wrong (not the intended statement as such, but how it was translated to data).
Markus
Cheers, Micru
PS: this is a re-post because my previous message was bounced back "for being too long" :)
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l