On 08/26/2015 06:01 PM, Svavar Kjarrval wrote:
On mið 26.ágú 2015 23:05, James Heald wrote:
There are a *lot* of problems with P279 (subclass), right across Wikidata.
These will only be corrected once people start doing searches in a systematic way and addressing the anomalies they find.
In this case, politician (Q82955) should *not* be a subclass of human (Q5), instead it should be a subclass of something like occupation (Q13516667), or alternatively perhaps profession (Q28640).
My understanding is that currently there are a vast number of incorrect subclass relationships in the project, messing up tree searches, and so far it is something that has simply not yet been systematically addressed.
-- James.
For now, what's the best way to find (and perhaps correct) incorrect declarations like these?
If I were to just change items for commonly used items like politician (Q82955) it might be construed as vandalism or someone who doesn't care about or understand the Stubbs-declared-as-a-human problem might just add that declaration back later.
When it comes to the gender property (P21), the human readable description indicates that it's to define genders in general, yet it's declared as an instance of an item (Q18608871) which only applies to humans, which of course has consequences further up in the hierarchy since the maintainers of item Q18608871 faithfully assume it only applies to humans.
Well, the situation with respect to Wikidata property for items about people (Q18608871) is very difficult. There is absolutely no machine-interpretable information associated with this class that can be used to deterimine that instances of it are only supposed to be used for people. So, at the bare minimum, such machine-interpretable information needs to be added.
Then there is the issue that there is no theory of how the machine-interpretable information that is associated with entities in Wikidata is to be processed. All the processing is currently done using uninterpretable procedures. For example, on https://www.wikidata.org/wiki/Property_talk:P22 there is information that is used to control some piece of code that checks to see that the subject of https://www.wikidata.org/wiki/Property:P21 belongs to person (Q215627) or fictional character (Q95074). However, there is no theory showing how this interacts with other parts of Wikidata, even such inherent parts of Wikidata as https://www.wikidata.org/wiki/Property:P31
In fact, there is even difficulty of determining simple truth in Wikidata. Two sources can conflict, and Wikidata is not in the position of being an arbiter for such conflicts, certainly not in general. To make the situation even more complex, Wikidata has a temporal aspect as well and has a need to admit exceptions to general statements.
So what can be done? Any solution is going to be tricky. That is not to say that some solutions cannot be found by looking at systems and standards that are already being used for storing large amounts of complex information. However, any solution is going to have to be carefully tailored to meet the requirements of Wikidata and Wikidatans. (Is there an official term for the people who are putting Wikidata and Wikidata information together?)
There is also a big chicken-and-egg problem here - a good solution to reliable machine-interpretation of Wikidata information requires, for example, consistent use of instance of, subclass, and subproperty; but what counts as a consistent use of these fundamental properties depends on a formal theory of what they mean.
I, for one, would find even just the attempt to solve this problem vastly interesting, and I have been doing some exploration as to what might be needed. My company is interested in using Wikidata as a source of background information, but finds that the lack of a good theory of Wikidata information is problematic, so I have some cover for spending time on this problem.
Anyway, if there is interest in machine interpretation of Wikidata information, if only to detect potential anomalies, I, and probably others, would be motivated to spend more time on trying to come up with potential solutions, hopefully in a collaborative effort that includes not just theoreticians but also Wikidatans.
In the case of the hierarchy Stubbs is associated with the maintainers have assumed all mayors are, without exception, humans or they somehow thought that if there were exceptions to this, the machines could somehow detect and apply them in each case. Both of those methods are, I think we agree, are wrong and we should find out why it's happening.
Is there a tool where one can put in a Wikidata item and it extracts declarations based on "higher" properties like subclass or instance of? Like if I were to input the item for Stubbs, it would travel the hierarchy and tell me what would be assumed about Stubbs based on the declarations further up in the tree.
Yes, it is called a reasoner. The design of a reasoner would very likely be one result of the sort of work described above, but without such work it is very hard to figure out just what is supposed to be done in any except the simple cases.
- Svavar Kjarrval
Peter F. Patel-Schneider Nuance Communications