[Splitting the general (Wikidata reasoning; this thread) from the specific (Wikidata family relationships for horses; original thread).]
Many issues have been brought up, and we cannot solve all with one big hammer. I have now started a WikiProject (see below) to address one of the key points raised by Peter:
''' Nobody has ever defined which inferences can/should be drawn from the content of Wikidata. '''
We do in fact use several properties that seem to ask for inferencing. Probably the clearest is "subclass of" (P279). It has been related to rdfs:subClassOf in many community discussions, so it seems clear that a similar meaning is intended. This would lead to the following rule:
''' If an item A has a "subclass of" statement with value B, and if item B has a "subclass of" statement with value C, then it should follow that item A has a "subclass of" statement with value C." '''
I think there is wide agreement on this idea. Constraints rely on it (constraint checking travels the P279 hierarchy), and it's a main motivation for why Wikidata Query has its "tree" feature. There are similarly clear intentions for the properties "instance of" (P31) and "subproperty of" (P1647). I am not spelling them out here.
Nevertheless, Peter is right that even in these cases, the intention is not fully clear, because of two reasons:
(1) There is no machine-readable specification of the intended behaviour. It's part of user discussions, not of the data or templates. Even the user discussions are distributed over several pages, so a lot of wiki archaeology is needed to get a full picture of what we, the community, might have intended. (2) The informal discussions on the intended semantics are not precise about all relevant cases. Many questions remain open, such as what to do if qualifiers are used on a statement (rarely the case for "subclass of", but not so uncommon for "instance of").
To address these issues, I propose to come up with a format that allows us to clearly specify inference rules such as the one for "subclass of" above. Each rule should have one page where it is specified (for humans and machines), explained (to humans), and discussed. It is not possible to encode such rules as property values on data pages (for a start, it would not be clear which page this should be on, because rules typically refer to several properties and items). Therefore, the best we could do now seems to have standard wiki pages for this. They could be linked from all relevant properties/items (talk pages) though.
Even if we do not have any reasoner to compute all the results, writing down the intended rules would be useful documentation for other users to clarify what we expect (see the original family relationship discussion).
I propose to start by gathering use cases, that is, examples of rules that we might want to express. From this, we can then extract suitable template structure. I have created a WikiProject for getting us started:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning
Feel free to contribute.
Best regards,
Markus
On 27.08.2015 06:26, Peter F. Patel-Schneider wrote:>
On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
On mið 26.ágú 2015 23:05, James Heald wrote:
There are a *lot* of problems with P279 (subclass), right across Wikidata.
These will only be corrected once people start doing searches in a systematic way and addressing the anomalies they find.
In this case, politician (Q82955) should *not* be a subclass of human (Q5), instead it should be a subclass of something like occupation (Q13516667), or alternatively perhaps profession (Q28640).
My understanding is that currently there are a vast number of incorrect subclass relationships in the project, messing up tree searches, and so far it is something that has simply not yet been systematically addressed.
-- James.
For now, what's the best way to find (and perhaps correct) incorrect declarations like these?
If I were to just change items for commonly used items like politician (Q82955) it might be construed as vandalism or someone who doesn't care about or understand the Stubbs-declared-as-a-human problem might just add that declaration back later.
When it comes to the gender property (P21), the human readable description indicates that it's to define genders in general, yet it's declared as an instance of an item (Q18608871) which only applies to humans, which of course has consequences further up in the hierarchy since the maintainers of item Q18608871 faithfully assume it only applies to humans.
Well, the situation with respect to Wikidata property for items
about people
(Q18608871) is very difficult. There is absolutely no
machine-interpretable
information associated with this class that can be used to deterimine
that
instances of it are only supposed to be used for people. So, at the bare minimum, such machine-interpretable information needs to be added.
Then there is the issue that there is no theory of how the machine-interpretable information that is associated with entities in
Wikidata
is to be processed. All the processing is currently done using uninterpretable procedures. For example, on https://www.wikidata.org/wiki/Property_talk:P22 there is information
that is
used to control some piece of code that checks to see that the subject of https://www.wikidata.org/wiki/Property:P21 belongs to person (Q215627) or fictional character (Q95074). However, there is no theory showing
how this
interacts with other parts of Wikidata, even such inherent parts of
Wikidata
as https://www.wikidata.org/wiki/Property:P31
In fact, there is even difficulty of determining simple truth in
Wikidata.
Two sources can conflict, and Wikidata is not in the position of being an arbiter for such conflicts, certainly not in general. To make the
situation
even more complex, Wikidata has a temporal aspect as well and has a
need to
admit exceptions to general statements.
So what can be done? Any solution is going to be tricky. That is
not to say
that some solutions cannot be found by looking at systems and
standards that
are already being used for storing large amounts of complex information. However, any solution is going to have to be carefully tailored to
meet the
requirements of Wikidata and Wikidatans. (Is there an official term
for the
people who are putting Wikidata and Wikidata information together?)
There is also a big chicken-and-egg problem here - a good solution to
reliable
machine-interpretation of Wikidata information requires, for example, consistent use of instance of, subclass, and subproperty; but what
counts as a
consistent use of these fundamental properties depends on a formal
theory of
what they mean.
I, for one, would find even just the attempt to solve this problem vastly interesting, and I have been doing some exploration as to what might be needed. My company is interested in using Wikidata as a source of
background
information, but finds that the lack of a good theory of Wikidata
information
is problematic, so I have some cover for spending time on this problem.
Anyway, if there is interest in machine interpretation of Wikidata information, if only to detect potential anomalies, I, and probably
others,
would be motivated to spend more time on trying to come up with potential solutions, hopefully in a collaborative effort that includes not just theoreticians but also Wikidatans.
In the case of the hierarchy Stubbs is associated with the maintainers have assumed all mayors are, without exception, humans or they somehow thought that if there were exceptions to this, the machines could somehow detect and apply them in each case. Both of those methods are, I think we agree, are wrong and we should find out why it's happening.
Is there a tool where one can put in a Wikidata item and it extracts declarations based on "higher" properties like subclass or instance of? Like if I were to input the item for Stubbs, it would travel the hierarchy and tell me what would be assumed about Stubbs based on the declarations further up in the tree.
Yes, it is called a reasoner. The design of a reasoner would very
likely be
one result of the sort of work described above, but without such work
it is
very hard to figure out just what is supposed to be done in any
except the
simple cases.
- Svavar Kjarrval
Peter F. Patel-Schneider Nuance Communications
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata