Hi Peter,
The community-defined meaning of *subclass of* (P279) is that of
rdfs:subClassOf [1]. Similarly, the community-defined meaning of *instance
of* (P31) is that of rdf:type [2, 3].
There are some open problems with how to handle qualifiers on *instance of*
and *subclass of* in RDF/OWL exports of P31 as rdf:type and P279 as
rdfs:subClassOf, but that does not negate the community's decision to tie
its two most basic membership properties to those W3C standard properties.
In the current RDF/OWL exports that follow the community interpretation of
P31 and P279, e.g. wikidata-taxonomy.nt.gz and wikidata-instances.nt.gz in
[4], statements that have qualifiers on either of those properties are
simply omitted.
The community's definition of disease is less established. However, there
is consensus that diseases like cancer (Q12078) and malaria (Q12156) are
classes. An instance of disease would be a particular case of a disease,
i.e. a particular case of an abnormal condition in a particular organism.
For example, it would be the particular case of throat cancer that caused
U.S. President Ulysses S. Grant to die, as reflected in the Wikidata
statement "Ulysses S. Grant *cause of death *throat cancer" [5].
Wikidata has no items on actual instances of disease to my knowledge --
although it does have at least one item about an instance of a symptom
[6]. That of course does not mean that such instances of disease do not
exist or that they could not theoretically be modeled in some local
Wikibase installation (e.g. in a physician's office or a hospital) that
uses Wikidata vocabulary to track actual instances of disease, e.g. a
particular case of pancreatic cancer in a patient.
If you have questions or concerns regarding how diseases are modeled, I
would recommend contacting Wikidata editor and disease ontologist Elvira
Mitraka (Emitraka) [7], as well as WikiProject Medicine [8] or WikiProject
Molecular Biology [9].
Regarding how outsiders can become aware of modeling methodology, I
recommend reading
and engaging
with particular domain modeling groups on Wikidata, e.g. the wikiprojects
mentioned above. This mailing list and Wikidata Project Chat [10] are also
good places to ask questions.
Finally, regarding your question "Is Wikidata uniform in applying this
methodology?", the answer is no. Wikidata's use of *subclass of* and *instance
of* varies among (and sometimes within) different domains of knowledge like
human occupations, creative work genres, cuisines, and sports. The basic
difference in usage among those domains is using *instance of* where others
would use *subclass of*.
For example, pizza (
) is currently
modeled as an instance of food and (transitively) a subclass of food.
Problematic indeed! Disease modeling achieves the same goal of easy
queryability by making statements like "malaria *subclass of* disease" and
"malaria *subclass of* parasitic protozoa infectious disease" [11], where
the latter value transitively resolves to disease. This is not only rather
redundant, but also makes the *subclass of* hierarchy cyclic and thus not a
directed acyclic graph (DAG) due to the situation you note in the item
about disease itself. But at least it avoids the more severe problem of
being ontologically incorrect as seen in the item on pizza -- and all
chemical elements, e.g. hydrogen (Q556) [12].
Regards,
Eric
1.
2.
3. is a -> instance of.
4.
5. Ulysses S. Grant: cause of death.
6. George H. W. Bush vomiting incident.
7.
8. WikiProject Medicine on Wikidata.
9. Wikiproject Molecular Biology on Wikidata.
10.
11. Malaria: subclass of.
12. Hydrogen.
On Sat, Oct 17, 2015 at 10:25 AM, Peter F. Patel-Schneider <
pfpschneider(a)gmail.com> wrote:
On 10/17/2015 12:55 AM, Thomas Douillard wrote:
I was a
bit surprised to see class reasoning used on diseases.
I was not aware of that, do you have links ?
See slide 38 of
http://www.slideshare.net/_Emw/an-ambitious-wikidata-tutorial
I was a
bit surprised to see class reasoning used on diseases.
This depends on a particular modelling methodology.
It's not surprising as the meaning of properties is community defined
(or sub
community defined) so any community can use
reasoning technology they
want to
use as which is consistent with the intended
meaning of properties. As
Wikidata do only stores statements anyone can use reasoning technologies
on
top of this that are community accepted. The
drawback of this approach
have
been discussed on another thread some days ago :
it could become tricky
to
understand for a simple user the path that lead
to a statement addition
and we
have to be careful to always provide informations
on which bot added
inferred
statements with that reasoning technology or rule
from which data.
What is the community-defined meaning of subclass of and diseases then?
Here is what I see in Wikidata.
https://www.wikidata.org/wiki/Q128581 breast cancer
has a
https://www.wikidata.org/wiki/Property:P279 subclass of
link to both
https://www.wikidata.org/wiki/Q12136 disease
and
https://www.wikidata.org/wiki/Q18556617 thoracic cancer
https://www.wikidata.org/wiki/Property:P279 subclass of
is linked via
https://www.wikidata.org/wiki/Property:P1628 equivalent property
to
http://www.w3.org/2000/01/rdf-schema#subClassOf
which is the subclass relationship between classes.
https://www.wikidata.org/wiki/Property:P279 subclass of
has English description
all of these items are instances of those items; this item is a class of
that
item. Not to be confused with Property:P31 (instance of).
which is rather confusing, but appears to be gloss of the RDFS meaning of
http://www.w3.org/2000/01/rdf-schema#subClassOf
Someone looking at all this is thus lead to believe that
https://www.wikidata.org/wiki/Property:P279 subclass of
is the same as the RDFS meaning of
http://www.w3.org/2000/01/rdf-schema#subClassOf
So diseases are classes. They then have instances. They can be reasoned
with
using techniques borrowed from RDFS.
This is a particular modelling methodology. It has its benefits. It
requires
a certain view of disease and diseases. The particular instantiation of
this
modelling methodology, where there is a redundant link to the top of the
disease hierarchy and that top loops back to itself, has its own benefits
and
drawbacks.
A bigger problem than the one you state, I think, is how outsiders can
determine that this modelling methodology is in place and understand it
adequately to effectively use the information or to contribute more
information. There is nothing on the discussion pages for the various
diseases that I looked at.
The modelling methodology used here is useful in many other places,
including
human occupations, creative work genres, cuisines, and sports. Is
Wikidata
uniform in applying this methodology? If this is not the case, then how is
the use of this methodology signalled?
I however noticed in heated recent debates that
some users on frwiki were
sensible to the argument that Wikidata only does store statements. This
kind
of users feared that Wikidata would induce an
alignment of semantics of
words
and items to the enwiki semantic, They believes
in the linguistic
hypothesis
that words in a language carry some kind of
language dependant meaning on
their own and feared some kind of "cultural contagion" by some kind of
mechanism where the specific meaning of english word would contaminate
the
french word. It has of course been said many time
that Wikidata was not
focused on words and linguistic but on definitions mainly, and that one
definition equals one item, that wikidata was the sum of all knowledge,
but
the argument that finally seemed to be effective
was the one that
Wikidata do
only store statements and do not einforce
constraint. It seems to be
effective
to convince them that Wikidata is indeed POV
agnostic.
In my discussion above, I tried to stay away from using the human-language
descriptions, preferring an external formal definition. Unfortunately,
Wikidata does not have an internal formal definition beyond the simple
description of the data structures. This lack, I think, is what makes the
human-language descriptions so important in Wikidata. My view is that a
stronger formal basis for Wikidata would help to reduce the possibility
that
descriptions in dominant human languages do indeed push out the other
descriptions.
2015-10-16 19:14 GMT+02:00 Peter F.
Patel-Schneider <
pfpschneider(a)gmail.com
<mailto:pfpschneider@gmail.com>>:
It's very pleasant to hear from someone else who thinks of Wikidata
as a
knowledge base (or at least hopes that
Wikidata can be considered as
a
knowledge base). Did you get any pushback on
this or on your stated
Wikidata
goal of structuring the sum of all human
knowledge?
Did you get any pushback on your section on classification in
Wikidata? It
seems to me that some of that is rather
controversial in the Wikidata
community. I was a bit surprised to see class reasoning used on
diseases.
This depends on a particular modelling
methodology.
peter
On 10/12/2015 11:47 AM, Emw wrote:
> Hi all,
> On Saturday, I facilitated a
workshop at the U.S. National
Archives entitled
> "An Ambitious Wikidata
Tutorial" as part of WikiConference USA
2015.
https://commons.wikimedia.org/wiki/File:An_Ambitious_Wikidata_Tutorial.pdf
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata