Hi Peter,
The community-defined meaning of subclass of (P279) is that of rdfs:subClassOf [1]. Similarly, the community-defined meaning of instance of (P31) is that of rdf:type [2, 3].
That's encouraging. I do note that there was quite a bit of discussion on these two properties. I am assuming that this has all died down and that the mearning of these two properties are now stable.
The meaning of these two properties are not defined in http://www.w3.org/TR/rdf-schema but instead in RDF 1.1 Semantics [http://www.w3.org/TR/rdf-semantics]. There there is a full formal definition of the RDFS meaning of instance of and subclass of.
This definition states that objects that are instances of a class are instances of superclasses of the class. Do Wikidata tools show these implied instance relationships? If not, then there is something decidedly lacking.
There are some open problems with how to handle qualifiers on instance of and subclass of in RDF/OWL exports of P31 as rdf:type and P279 as rdfs:subClassOf, but that does not negate the community's decision to tie its two most basic membership properties to those W3C standard properties. In the current RDF/OWL exports that follow the community interpretation of P31 and P279, e.g. wikidata-taxonomy.nt.gz and wikidata-instances.nt.gz in [4], statements that have qualifiers on either of those properties are simply omitted.
The treatment of qualified subclass of and instance of is not addressed in RDF 1.1 Semantics, and is independent of how to export Wikidata information as RDF or OWL. Is there a theory of how qualifiers (and other aspects of Wikidata) are supposed to interact with these two properties in Wikidata? If not, how can there be a true community understanding of these two properties?
The community's definition of disease is less established. However, there is consensus that diseases like cancer (Q12078) and malaria (Q12156) are classes. An instance of disease would be a particular case of a disease, i.e. a particular case of an abnormal condition in a particular organism. For example, it would be the particular case of throat cancer that caused U.S. President Ulysses S. Grant to die, as reflected in the Wikidata statement "Ulysses S. Grant cause of death throat cancer" [5].
That's fine and this modelling methodology does have its advantage. Having this written down somewhere that is easy to find would be helpful.
Wikidata has no items on actual instances of disease to my knowledge -- although it does have at least one item about an instance of a symptom [6]. That of course does not mean that such instances of disease do not exist or that they could not theoretically be modeled in some local Wikibase installation (e.g. in a physician's office or a hospital) that uses Wikidata vocabulary to track actual instances of disease, e.g. a particular case of pancreatic cancer in a patient.
If you have questions or concerns regarding how diseases are modeled, I would recommend contacting Wikidata editor and disease ontologist Elvira Mitraka (Emitraka) [7], as well as WikiProject Medicine [8] or WikiProject Molecular Biology [9].
Thanks.
Regarding how outsiders can become aware of modeling methodology, I recommend reading https://www.wikidata.org/wiki/Help:Basic_membership_properties and engaging with particular domain modeling groups on Wikidata, e.g. the wikiprojects mentioned above. This mailing list and Wikidata Project Chat [10] are also good places to ask questions.
I have indeed asked the question on this mailing list and received quite some useful responses. Is there a way of getting from the appropriate Wikidata pages to the domain modelling groups without having to ask? It seems to me that this would be helpful.
Finally, regarding your question "Is Wikidata uniform in applying this methodology?", the answer is no. Wikidata's use of subclass of and instance of varies among (and sometimes within) different domains of knowledge like human occupations, creative work genres, cuisines, and sports. The basic difference in usage among those domains is using instance of where others would use subclass of.
I was unable to find this information for diseases without using my expertise in this area. Is there some easier way of determining which approach is being used in which domain?
For example, pizza (https://www.wikidata.org/wiki/Q177) is currently modeled as an instance of food and (transitively) a subclass of food. Problematic indeed!
Well, not really problematic per se, but a modelling methodology that can easily lead to incorrect determinations. In essence, food is the union of those things that we might eat (the pizza I ate in Bethlehem last week) and categories of these things (bad pizza from a hole-in-the-wall restaurant).
Disease modeling achieves the same goal of easy queryability by making statements like "malaria subclass of disease" and "malaria subclass of parasitic protozoa infectious disease" [11], where the latter value transitively resolves to disease. This is not only rather redundant, but also makes the subclass of hierarchy cyclic and thus not a directed acyclic graph (DAG) due to the situation you note in the item about disease itself. But at least it avoids the more severe problem of being ontologically incorrect as seen in the item on pizza -- and all chemical elements, e.g. hydrogen (Q556) [12].
This modelling methodology does not fall prey to the above problem, but is not itself problem-free. Does the redundant information have any impact? If not, then why include it? If so, then everyone needs to know about it.
I don't see any problems with cyclic hierarchies per se, but stating cyclic hierarchies is often a signal of a modelling error. In the disease domain it appears that the only cyclicity is from disease to itself. This appears to be a part of the modelling methodology, but is this stated anywhere?
As has been stated elsewhere, it would be better to have a higher-order class or some other signal that this modelling methodology is in use for disease and its subclasses.
Regards, Eric https://www.wikidata.org/wiki/User:Emw
- http://www.w3.org/TR/rdf-schema/#ch_subclassof
- http://www.w3.org/TR/rdf-schema/#ch_type
- is a -> instance of.
https://www.wikidata.org/w/index.php?title=Property_talk:P31&oldid=25407...
http://tools.wmflabs.org/wikidata-exports/rdf/index.php?content=dump_downloa...
- Ulysses S. Grant: cause of death. https://www.wikidata.org/wiki/Q34836#P509
- George H. W. Bush vomiting incident. https://www.wikidata.org/wiki/Q5540112
- https://www.wikidata.org/wiki/User:Emitraka
- WikiProject Medicine on Wikidata.
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine
- Wikiproject Molecular Biology on Wikidata.
https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology
- https://www.wikidata.org/wiki/Wikidata:Project_chat
- Malaria: subclass of.
https://www.wikidata.org/w/index.php?title=Q12156&oldid=259072228#P279