New subject: WIkidata reasoning (Was: Properties for family relationships in Wikidata)

27 Aug 2015

      [Splitting the general (Wikidata reasoning; this thread) from the 
specific (Wikidata family relationships for horses; original thread).]
Many issues have been brought up, and we cannot solve all with one big 
hammer. I have now started a WikiProject (see below) to address one of 
the key points raised by Peter:
'''
Nobody has ever defined which inferences can/should be drawn from the 
content of Wikidata.
'''
We do in fact use several properties that seem to ask for inferencing. 
Probably the clearest is "subclass of" (P279). It has been related to 
rdfs:subClassOf in many community discussions, so it seems clear that a 
similar meaning is intended. This would lead to the following rule:
'''
If an item A has a "subclass of" statement with value B,
and if item B has a "subclass of" statement with value C,
   then it should follow
that item A has a "subclass of" statement with value C."
'''
I think there is wide agreement on this idea. Constraints rely on it 
(constraint checking travels the P279 hierarchy), and it's a main 
motivation for why Wikidata Query has its "tree" feature. There are 
similarly clear intentions for the properties "instance of" (P31) and 
"subproperty of" (P1647). I am not spelling them out here.
Nevertheless, Peter is right that even in these cases, the intention is 
not fully clear, because of two reasons:
(1) There is no machine-readable specification of the intended 
behaviour. It's part of user discussions, not of the data or templates. 
Even the user discussions are distributed over several pages, so a lot 
of wiki archaeology is needed to get a full picture of what we, the 
community, might have intended.
(2) The informal discussions on the intended semantics are not precise 
about all relevant cases. Many questions remain open, such as what to do 
if qualifiers are used on a statement (rarely the case for "subclass 
of", but not so uncommon for "instance of").
To address these issues, I propose to come up with a format that allows 
us to clearly specify inference rules such as the one for "subclass of" 
above. Each rule should have one page where it is specified (for humans 
and machines), explained (to humans), and discussed. It is not possible 
to encode such rules as property values on data pages (for a start, it 
would not be clear which page this should be on, because rules typically 
refer to several properties and items). Therefore, the best we could do 
now seems to have standard wiki pages for this. They could be linked 
from all relevant properties/items (talk pages) though.
Even if we do not have any reasoner to compute all the results, writing 
down the intended rules would be useful documentation for other users to 
clarify what we expect (see the original family relationship discussion).
I propose to start by gathering use cases, that is, examples of rules 
that we might want to express. From this, we can then extract suitable 
template structure. I have created a WikiProject for getting us started:
https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning
Feel free to contribute.
Best regards,
Markus
On 27.08.2015 06:26, Peter F. Patel-Schneider wrote:>
...
On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
...
On mið 26.ágú 2015 23:05, James Heald wrote:
...
There are a *lot* of problems with P279 (subclass), right across
Wikidata.
These will only be corrected once people start doing searches in a
systematic way and addressing the anomalies they find.
In this case, politician (Q82955) should *not* be a subclass of human
(Q5), instead it should be a subclass of something like occupation
(Q13516667), or alternatively perhaps profession (Q28640).
My understanding is that currently there are a vast number of
incorrect subclass relationships in the project, messing up tree
searches, and so far it is something that has simply not yet been
systematically addressed.
-- James.
For now, what's the best way to find (and perhaps correct) incorrect
declarations like these?
If I were to just change items for commonly used items like politician
(Q82955) it might be construed as vandalism or someone who doesn't care
about or understand the Stubbs-declared-as-a-human problem might just
add that declaration back later.
When it comes to the gender property (P21), the human readable
description indicates that it's to define genders in general, yet it's
declared as an instance of an item (Q18608871) which only applies to
humans, which of course has consequences further up in the hierarchy
since the maintainers of item Q18608871 faithfully assume it only
applies to humans.
Well, the situation with respect to  Wikidata property for items
about people
...
(Q18608871) is very difficult.   There is absolutely no
machine-interpretable
...
information associated with this class that can be used to deterimine
that
...
instances of it are only supposed to be used for people.  So, at the bare
minimum, such machine-interpretable information needs to be added.
Then there is the issue that there is no theory of how the
machine-interpretable information that is associated with entities in
Wikidata
...
is to be processed.   All the processing is currently done using
uninterpretable procedures.  For example, on
https://www.wikidata.org/wiki/Property_talk:P22 there is information
that is
...
used to control some piece of code that checks to see that the subject of
https://www.wikidata.org/wiki/Property:P21 belongs to person (Q215627) or
fictional character (Q95074).  However, there is no theory showing
how this
...
interacts with other parts of Wikidata, even such inherent parts of
Wikidata
...
as https://www.wikidata.org/wiki/Property:P31
In fact, there is even difficulty of determining simple truth in
Wikidata.
...
Two sources can conflict, and Wikidata is not in the position of being an
arbiter for such conflicts, certainly not in general.  To make the
situation
...
even more complex, Wikidata has a temporal aspect as well and has a
need to
...
admit exceptions to general statements.
So what can be done?  Any solution is going to be tricky.  That is
not to say
...
that some solutions cannot be found by looking at systems and
standards that
...
are already being used for storing large amounts of complex information.
However, any solution is going to have to be carefully tailored to
meet the
...
requirements of Wikidata and Wikidatans.  (Is there an official term
for the
...
people who are putting Wikidata and Wikidata information together?)
There is also a big chicken-and-egg problem here - a good solution to
reliable
...
machine-interpretation of Wikidata information requires, for example,
consistent use of instance of, subclass, and subproperty; but what
counts as a
...
consistent use of these fundamental properties depends on a formal
theory of
...
what they mean.
I, for one, would find even just the attempt to solve this problem vastly
interesting, and I have been doing some exploration as to what might be
needed.  My company is interested in using Wikidata as a source of
background
...
information, but finds that the lack of a good theory of Wikidata
information
...
is problematic, so I have some cover for spending time on this problem.
Anyway, if there is interest in machine interpretation of Wikidata
information, if only to detect potential anomalies, I, and probably
others,
...
would be motivated to spend more time on trying to come up with potential
solutions, hopefully in a collaborative effort that includes not just
theoreticians but also Wikidatans.
...
In the case of the hierarchy Stubbs is associated with the maintainers
have assumed all mayors are, without exception, humans or they somehow
thought that if there were exceptions to this, the machines could
somehow detect and apply them in each case. Both of those methods are, I
think we agree, are wrong and we should find out why it's happening.
Is there a tool where one can put in a Wikidata item and it extracts
declarations based on "higher" properties like subclass or instance of?
Like if I were to input the item for Stubbs, it would travel the
hierarchy and tell me what would be assumed about Stubbs based on the
declarations further up in the tree.
Yes, it is called a reasoner.  The design of a reasoner would very
likely be
...
one result of the sort of work described above, but without such work
it is
...
very hard to figure out just what is supposed to be done in any
except the
...
simple cases.
...

Svavar Kjarrval

Peter F. Patel-Schneider
Nuance Communications

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
-- 
Markus Kroetzsch, Departmental Lecturer
Department of Computer Science, University of Oxford
Room 306, Parks Road, OX1 3QD Oxford, United Kingdom
+44 (0)1865 283529               http://korrekt.org/