On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
On mið 26.ágú 2015 23:05, James Heald wrote:
There are a *lot* of problems with P279
(subclass), right across
Wikidata.
These will only be corrected once people start doing searches in a
systematic way and addressing the anomalies they find.
In this case, politician (Q82955) should *not* be a subclass of human
(Q5), instead it should be a subclass of something like occupation
(Q13516667), or alternatively perhaps profession (Q28640).
My understanding is that currently there are a vast number of
incorrect subclass relationships in the project, messing up tree
searches, and so far it is something that has simply not yet been
systematically addressed.
-- James.
For now, what's the best way to find (and perhaps correct) incorrect
declarations like these?
If I were to just change items for commonly used items like politician
(Q82955) it might be construed as vandalism or someone who doesn't care
about or understand the Stubbs-declared-as-a-human problem might just
add that declaration back later.
When it comes to the gender property (P21), the human readable
description indicates that it's to define genders in general, yet it's
declared as an instance of an item (Q18608871) which only applies to
humans, which of course has consequences further up in the hierarchy
since the maintainers of item Q18608871 faithfully assume it only
applies to humans.
Well, the situation with respect to Wikidata property for items about people
(Q18608871) is very difficult. There is absolutely no machine-interpretable
information associated with this class that can be used to deterimine that
instances of it are only supposed to be used for people. So, at the bare
minimum, such machine-interpretable information needs to be added.
Then there is the issue that there is no theory of how the
machine-interpretable information that is associated with entities in Wikidata
is to be processed. All the processing is currently done using
uninterpretable procedures. For example, on
https://www.wikidata.org/wiki/Property_talk:P22 there is information that is
used to control some piece of code that checks to see that the subject of
https://www.wikidata.org/wiki/Property:P21 belongs to person (Q215627) or
fictional character (Q95074). However, there is no theory showing how this
interacts with other parts of Wikidata, even such inherent parts of Wikidata
as
https://www.wikidata.org/wiki/Property:P31
In fact, there is even difficulty of determining simple truth in Wikidata.
Two sources can conflict, and Wikidata is not in the position of being an
arbiter for such conflicts, certainly not in general. To make the situation
even more complex, Wikidata has a temporal aspect as well and has a need to
admit exceptions to general statements.
So what can be done? Any solution is going to be tricky. That is not to say
that some solutions cannot be found by looking at systems and standards that
are already being used for storing large amounts of complex information.
However, any solution is going to have to be carefully tailored to meet the
requirements of Wikidata and Wikidatans. (Is there an official term for the
people who are putting Wikidata and Wikidata information together?)
There is also a big chicken-and-egg problem here - a good solution to reliable
machine-interpretation of Wikidata information requires, for example,
consistent use of instance of, subclass, and subproperty; but what counts as a
consistent use of these fundamental properties depends on a formal theory of
what they mean.
I, for one, would find even just the attempt to solve this problem vastly
interesting, and I have been doing some exploration as to what might be
needed. My company is interested in using Wikidata as a source of background
information, but finds that the lack of a good theory of Wikidata information
is problematic, so I have some cover for spending time on this problem.
Anyway, if there is interest in machine interpretation of Wikidata
information, if only to detect potential anomalies, I, and probably others,
would be motivated to spend more time on trying to come up with potential
solutions, hopefully in a collaborative effort that includes not just
theoreticians but also Wikidatans.
In the case of the hierarchy Stubbs is associated with
the maintainers
have assumed all mayors are, without exception, humans or they somehow
thought that if there were exceptions to this, the machines could
somehow detect and apply them in each case. Both of those methods are, I
think we agree, are wrong and we should find out why it's happening.
Is there a tool where one can put in a Wikidata item and it extracts
declarations based on "higher" properties like subclass or instance of?
Like if I were to input the item for Stubbs, it would travel the
hierarchy and tell me what would be assumed about Stubbs based on the
declarations further up in the tree.
Yes, it is called a reasoner. The design of a reasoner would very likely be
one result of the sort of work described above, but without such work it is
very hard to figure out just what is supposed to be done in any except the
simple cases.
- Svavar Kjarrval
Peter F. Patel-Schneider
Nuance Communications