Hi Peter,
The community-defined meaning of subclass of (P279) is
that of
rdfs:subClassOf [1]. Similarly, the community-defined meaning of
instance of (P31) is that of rdf:type [2, 3].
That's encouraging. I do note that there was quite a bit of
discussion on these two properties. I am assuming that this has all
died down and that the mearning of these two properties are now
stable.
The meaning of these two properties are not defined in
http://www.w3.org/TR/rdf-schema but instead in RDF 1.1 Semantics
[
http://www.w3.org/TR/rdf-semantics]. There there is a full formal
definition of the RDFS meaning of instance of and subclass of.
This definition states that objects that are instances of a class
are instances of superclasses of the class. Do Wikidata tools show
these implied instance relationships? If not, then there is
something decidedly lacking.
There are some open problems with how to handle
qualifiers on
instance of and subclass of in RDF/OWL exports of P31 as rdf:type
and P279 as rdfs:subClassOf, but that does not negate the
community's decision to tie its two most basic membership properties
to those W3C standard properties. In the current RDF/OWL exports
that follow the community interpretation of P31 and P279,
e.g. wikidata-taxonomy.nt.gz and wikidata-instances.nt.gz in [4],
statements that have qualifiers on either of those properties are
simply omitted.
The treatment of qualified subclass of and instance of is not
addressed in RDF 1.1 Semantics, and is independent of how to export
Wikidata information as RDF or OWL. Is there a theory of how
qualifiers (and other aspects of Wikidata) are supposed to interact
with these two properties in Wikidata? If not, how can there be a
true community understanding of these two properties?
The community's definition of disease is less
established. However,
there is consensus that diseases like cancer (Q12078) and malaria
(Q12156) are classes. An instance of disease would be a particular
case of a disease, i.e. a particular case of an abnormal condition
in a particular organism. For example, it would be the particular
case of throat cancer that caused U.S. President Ulysses S. Grant to
die, as reflected in the Wikidata statement "Ulysses S. Grant cause
of death throat cancer" [5].
That's fine and this modelling methodology does have its advantage.
Having this written down somewhere that is easy to find would be
helpful.
Wikidata has no items on actual instances of disease
to my knowledge
-- although it does have at least one item about an instance of a
symptom [6]. That of course does not mean that such instances of
disease do not exist or that they could not theoretically be modeled
in some local Wikibase installation (e.g. in a physician's office or
a hospital) that uses Wikidata vocabulary to track actual instances
of disease, e.g. a particular case of pancreatic cancer in a
patient.
If you have questions or concerns regarding how diseases are
modeled, I would recommend contacting Wikidata editor and disease
ontologist Elvira Mitraka (Emitraka) [7], as well as WikiProject
Medicine [8] or WikiProject Molecular Biology [9].
Thanks.
Regarding how outsiders can become aware of modeling
methodology, I
recommend reading
https://www.wikidata.org/wiki/Help:Basic_membership_properties and
engaging with particular domain modeling groups on Wikidata,
e.g. the wikiprojects mentioned above. This mailing list and
Wikidata Project Chat [10] are also good places to ask questions.
I have indeed asked the question on this mailing list and received
quite some useful responses. Is there a way of getting from the
appropriate Wikidata pages to the domain modelling groups without
having to ask? It seems to me that this would be helpful.
Finally, regarding your question "Is Wikidata
uniform in applying
this methodology?", the answer is no. Wikidata's use of subclass of
and instance of varies among (and sometimes within) different
domains of knowledge like human occupations, creative work genres,
cuisines, and sports. The basic difference in usage among those
domains is using instance of where others would use subclass of.
I was unable to find this information for diseases without using my
expertise in this area. Is there some easier way of determining
which approach is being used in which domain?
For example, pizza
(
https://www.wikidata.org/wiki/Q177) is currently
modeled as an instance of food and (transitively) a subclass of
food. Problematic indeed!
Well, not really problematic per se, but a modelling methodology
that can easily lead to incorrect determinations. In essence, food
is the union of those things that we might eat (the pizza I ate in
Bethlehem last week) and categories of these things (bad pizza from
a hole-in-the-wall restaurant).
Disease modeling achieves the same goal
of easy queryability by making statements like "malaria subclass of
disease" and "malaria subclass of parasitic protozoa infectious
disease" [11], where the latter value transitively resolves to
disease. This is not only rather redundant, but also makes the
subclass of hierarchy cyclic and thus not a directed acyclic graph
(DAG) due to the situation you note in the item about disease
itself. But at least it avoids the more severe problem of being
ontologically incorrect as seen in the item on pizza -- and all
chemical elements, e.g. hydrogen (Q556) [12].
This modelling methodology does not fall prey to the above problem,
but is not itself problem-free. Does the redundant information have
any impact? If not, then why include it? If so, then everyone
needs to know about it.
I don't see any problems with cyclic hierarchies per se, but stating
cyclic hierarchies is often a signal of a modelling error. In the
disease domain it appears that the only cyclicity is from disease to
itself. This appears to be a part of the modelling methodology, but
is this stated anywhere?
As has been stated elsewhere, it would be better to have a
higher-order class or some other signal that this modelling
methodology is in use for disease and its subclasses.
https://www.wikidata.org/w/index.php?title=Property_talk:P31&oldid=2540…
4.
http://tools.wmflabs.org/wikidata-exports/rdf/index.php?content=dump_downlo…
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Medicine
9. Wikiproject Molecular Biology on Wikidata.
https://www.wikidata.org/wiki/Wikidata:WikiProject_Molecular_biology
https://www.wikidata.org/w/index.php?title=Q12156&oldid=259072228#P279