Hi Eric,
I have been following this issue for a long time, and I haven't found any satisfactory answer from anyone or from any source. Totally disappointed I decided to scrap everything and start again from the beginning, working on a foundation that eventually would allow to represent emergent systems and natural language. Pulling from that thread I got some interesting insights, that I hope you find useful: https://docs.google.com/document/d/1dc99zyxdPX8t6Ept_JjSxNalij2PNdYt-A3TG2e4...
If you don't have time or patience to read the whole 27 pages (which is just an introduction that needs refinement and to be expanded): - in real life there is no difference between calling something "metaclass", "identifier" or "information", they all refer to an observer system encoding an observed signal together with a pattern recognition model - in real life instantiation is an observer system putting a frame on an observed system. This frame might be more or less justified considering the intrinsic qualities of what is placed inside, but in the end it is entirely observer-defined, so quite useless other than to point to an "agreed or arbitrary bottom concept chosen by the observer"
( I want to emphasize "in real life" because sometimes I notice a focus in devising "mathematical sound" approaches and not so much approaches based on reality itself. )
In my opinion the solution to atoms is to separate identifiers from the entity with a new property ("has/identifier of" or "has/manifestation of") and use instance_of only when really necessary.
I hope we don't have to resort to Captain Metaphysics :) http://existentialcomics.com/comic/47
Cheers, Micru
On Fri, Sep 26, 2014 at 2:59 PM, Emw emw.wiki@gmail.com wrote:
The statement "ethanol *instance of* chemical compound" is ontologically incorrect. Importantly, it is also incompatible with ChEBI, the most widely-used chemistry ontology.
The matter of how to apply *instance of* (P31, rdf:type) and *subclass of* (P279, rdfs:subClassOf) on Wikidata in relation to chemical entities has been, as Thomas puts it, a long discussion [1-5]. Hopefully with a wider audience and experts like Markus Krötzsch and Denny Vrandečić now interested, we can come to a resolution at least in the particular domain of chemical compounds. Since it concerns interoperability with another large Semantic Web project, I have copied Janna Hastings and Alan Ruttenberg on this discussion. Janna coordinates ChEBI. Alan coordinates BFO, the upper ontology used by ChEBI and many other major ontologies in the natural sciences, like Gene Ontology and Disease Ontology.
Denny indicates how the statement "Porsche 356 *instance of *car" would be incorrect in Wikidata even though "Porsche 356 *is a* car" is acceptable in everyday speech. Similarly, "ethanol *instance of* chemical compound" is incorrect in Wikidata even though "ethanol *is a* chemical compound" is acceptable in less formal contexts.
A key difference between talk about cars and talk about chemicals is that, with cars, we have familiar terms like "car model" that distinguish concrete instances (that *particular* car you see on the street) from abstract "instances" (i.e. metaclasses, classes that are also instances, the *kind* of car that you see on the street). We do not have a well-known term like "chemical model" or "chemical compound type" to distinguish classes (types) of chemicals and instances (tokens) of chemicals. When one speaks of the properties of ethanol or hydrogen, it is understood that the subject is *all concrete, particular, spatiotemporal tokens, i.e. instances *of ethanol and hydrogen -- not just a specific ethanol molecule floating in that container before you on a Saturday with friends, but all molecules that we label "ethanol" everywhere.
Thus, in order to formally classify ethanol itself as opposed to some particular ethanol molecule, we must say for an item like http://www.wikidata.org/wiki/Q153: "ethanol *subclass of* chemical compound" and not "ethanol *instance of* chemical compound". (On Wikidata, the statement is more precisely "ethanol *subclass of *alcohol", but it is entailed from the statements "alcohol *subclass of* organic compound" and "organic compound *subclass of* chemical compound" that "ethanol *subclass of* chemical compound".)
A common defense of statements like "ethanol *instance of* chemical compound" is that Wikidata will never have items about any concrete molecules of ethanol, so, since ethanol is a "leaf node" in our concept taxonomy, it makes sense to state that ethanol is an instance. That interpretation of "instance" is short-sighted. It precludes us from ever talking about particular tokens of ethanol, or particular aggregates of such objects, without overhauling our chemistry ontology. Excluding consideration of metaclasses like "chemical compound type", the fact that an entity is a leaf node in a concept hierarchy is a necessary but not sufficient condition for using *instance of*.
Another common suggestion is that we should state something like "ethanol *instance of* chemical compound type" and "ethanol *subclass of* chemical compound".
To see where that gets us, try wrapping your head around this: https://commons.wikimedia.org/wiki/File:Atom_classes.svg. Really, take a look. If we want Wikidata's concept hierarchy to be seen as of dauntingly complex, pervasively applying that kind of three-layer classification scheme will do.
The kind of explicit metamodeling seen when punning things like cars and car models, ships and ship classes, biological taxa and organisms, etc. works reasonably well in certain domains. But, while we hold that hammer in one hand, we should be careful not to see everything as a nail. Outside domains that have established vocabulary for metaclasses, imposing explicit metamodeling with statements like "ethanol *instance of* chemical compound type" or "hydrogen *instance of* atom type" will strike users as unduly complex.
Without such metamodeling, though, querying for a list of chemical compounds becomes murkier. Surely we would want to return "ethanol" and not "organic compound" in such a list. How about "alcohol"? Relatedly, if we don't state "oxygen *instance of *chemical element", then how can we easily query for all the elements in the Periodic Table of Elements without including in the results of any potential subclasses of oxygen (e.g., isotopes of oxygen like oxygen-16, oxygen-17, etc.)?
There are ways to achieve that in SPARQL using rdfs:subClassOf / P279 / *subclass of*, but they require adhering to certain conventions. When faced with requiring many potential query users to learn some Wikidata MetaObject Protocol, though, I'm inclined to make some sacrifices for simplicity, ontological correctness, and consistency with major existing ontologies.
In summary, this ball has punted for over a year now. Because of the impasse in how to classify chemical entities, we now have showcase items that have obvious problems like entailing that something is both a class and an instance of chemical compound. We need input from a wider group of people knowledgeable about ontology or chemistry, ideally both. Hopefully with a Wikimedian in Residence at the Royal Society of Chemistry [6] we'll get some more focused resources on this. All major scientific ontologies use *subclass of* (rdfs:subClassOf), not *instance of* (rdf:type), to classify such things. In my opinion, Wikidata should maintain technical and philosophical compatibility with ontologies like ChEBI and remove statements like "ethanol *instance of* chemical compound". This would improve interoperability between Wikidata and the rest of the Semantic Web.
Thanks, Eric
https://www.wikidata.org/wiki/User:Emw
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Forth_an... 2. https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#chemical... . 3. https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry#Germanium_....
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Subclass... 5. https://www.wikidata.org/wiki/Help_talk:Basic_membership_properties#Proposit... 6. http://pigsonthewing.org.uk/wikimedian-residence-royal-society-chemistry/
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l