Re: [Wikidata-l] Subclass of/instance of

15 May 2014

On 14/05/14 19:33, Joe Filceolaire wrote:
...
  Except that there are lots of people who have appeared
in one movie who
 don't consider themselves actors and should not have the
 'occupation=>actor/actress'. There are good reasons for some constraints
 to be gadgets that can be overridden rather than hard coded semantic limits. 
Sure, we completely agree here. It was just an example. But it shows why 
we need any such feature to be controlled by the community ;-)

...

 I do think we should be able to have hard coded reverse properties and
 symmettric properties. 
By "hard coded" do you mean "stored explicitly" (as opposed to: 
"inferred in some way")? It will always be possible to store anything 
explicitly in this sense (but I guess you know this; maybe I 
misunderstood what you said; feel free to clarify).

In general, what I mentioned about inferencing is not supposed to alter 
the way in which the site works. It would be more like a layer on top 
that could be useful for asking queries. For example, imagine you want 
to query for the grandmother of a person: we don't have this property in 
Wikidata but we have enough information to answer the query. So you 
would have to research how to get this information by combining existing 
properties. The idea is that one could have a place to keep this 
information (= the definition of "grandmother" in terms of Wikidata 
properties). We would then have a "community approved" way of finding 
grandmothers in Wikidata, and you would be much faster with your query. 
At the same time, you could look up the definition to find out how 
Wikidata really stores this information. None of this would would change 
how the underlying data works, but it could contribute to some data 
modelling problems because it gives you an option to "support" a 
property without the added maintenance cost on the data management level.

Cheers,

Markus

...
  On Wed, May 14, 2014 at 2:33 PM, Markus Krötzsch
 &lt;markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>
 wrote: 
...
               I guess there is already a group of
people who deal w>
     Hi Eric,

     Thanks for all the information. This was very helpful. I only get to
     answer now since we have been quite busy building RDF exports for
     Wikidata (and writing a paper about it). I will soon announce this
     here (we still need to fix a few details).

     You were asking about using these properties like rdfs:subClassOf
     and rdf:type. I think that's entirely possible, since the modelling
     is very reasonable and would probably yield good results. Our
     reasoner ELK could easily handle the class hierarchy in terms of
     size, but you don't really need such a highly optimized tool for
     this as long as you only have subClassOf. In fact, the page you
     linked to shows that it is perfectly possible to compute the class
     hierarchy with Wikidata Query and to display all of it on one page.
     ELK's main task is to compute class hierarchies for more complicated
     ontologies, which we do not have yet. OTOH, query answering and data
     access are different tasks that ELK is not really intended for
     (although it could do some of this as well).

     Regarding future perspectives: one thing that we have also done is
     to extract OWL axioms from property constraint templates on Wikidata
     talk pages (we will publish the result soon, when announcing the
     rest). This gives you only some specific types of OWL axioms, but it
     is making things a bit more interesting already. In particular,
     there are some constraints that tell you that an item should have a
     certain class, so this is something you could reason with. However,
     the current property constraint system does not work too well for
     stating axioms that are not related to a particular property (such
     as: "Every [instance of] person who appears as an actor in some film
     should be [instance of] in the class 'actor'" -- which property or
     item page should this be stated on?). But the constraints show that
     it makes sense to express such information somehow.

     In the end, however, the real use of OWL (and similar ontology
     languages) is to remove the need for making everything explicit.
     That is, instead of "constraints" (which say: "if your data looks
     like X, then your data should also include Y") you have "axioms"
     (which say: "if your data looks like X, then Y follows
     automatically"). So this allows you to remove redundancy rather than
     to detect omissions. This would make more sense with "derived"
     notions that one does not want to store in the database, but which
     make sense for queries (like "grandmother").

     One would need a bit more infrastructure for this; in particular,
     one would need to define "grandmother" (with labels in many
     languages) even if one does not want to use it as a property but
     only in queries. Maybe one could have a separate Wikibase
     installation for defining such derived notions without needing to
     change Wikidata? There are no statements on properties yet, but one
     could also use item pages to define derived properties when using
     another site ...

     Best regards,

     Markus

     P.S. Thanks for all the work on the "semantic" modelling aspects of
     Wikidata. I have seen that you have done a lot in the discussions to
     clarify things there.

     On 06/05/14 04:53, emw wrote:

         Hi Markus,

         You asked "who is creating all these [subclass of] statements
         and how is
         this done?"

         The class hierarchy in

http://tools.wmflabs.org/__wikidata-todo/tree.html?q=__Q35120&rp=279&am…

<http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q35120&rp=279&lang=en>
         shows a few relatively large subclass trees for specialist domains,
         including molecular biology and mineralogy.  The several thousand
         subclass of 'gene' and 'protein' subclass claims were created by
         members
         of WikiProject Molecular biology (WD:MB), based on discussions
         in [1]
         and [2].  The decision to use P279 instead of P31 there was
         based on the
         fact that the "is-a" relation in Gene Ontology maps to
         rdfs:subClassOf,
         which P279 is based on.  The claims were added by a bot [3],
         with input
         from WD:MB members.  The data ultimately comes from external
         biological
         databases.

         A glance at the mineralogy class hierarchy indicates it has been
         constructed by WikiProject Mineralogy [4] members through non-bot
         edits.  I imagine most of the other subclass of claims are done
         manually
         or semi-automatically outside specific Wikiproject efforts.  In
         other
         words, I think most of the other P279 claims are added by
         Wikidata users
         going into the UI and building usually-reasonable concept
         hierarchies on
         domains they're interested in.  I've worked on constructing class
         hierarchies for health problems (e.g. diseases and injuries) [5] and
         medical procedures [6] based on classifications like ICD-10 and
         assertions and templates on Wikipedia (e.g. [8]).

         It's not incredibly surprising to me that Wikidata has about 36,000
         subclass of (P279) claims [9].  The property has been around for
         over a
         year and is a regular topic of discussion [10] along with
         instance of
         (P31), which has over 6,600,000 claims.

         You noted a dubious claim subclass of claim for 'House of Staufen'
         (Q130875).  I agree that instance of would probably be the better
         membership property to use there.  Such questionable usage of
         P279 is
         probably uncommon, but definitely not singular.  The dynasty class
         hierarchy shows 13 dubious cases at the moment [11].  I would
         guess less
         than 5% of subclass of claims have that kind of issue, where
         instance of
         would make more sense.  I think there are probably vastly more
         cases of
         the converse: instance of being used where subclass of would
         make more
         sense.

         As you probably know, P31 and P279 are intended to have the
         semantics of
         rdf:type and rdfs:subClassOf per community decision.  A while
         ago I read
         a bit about the ELK reasoner you were involved with [12], which
         makes
         use of the seemingly class-centric OWL EL profile.  Do you have any
         plans to integrate features of ELK with the Wikidata Toolkit
         [13]?  How
         do you see reasoning engines using P31 and P279 in the future,
         if at all?

         Thanks,
         Eric

         https://www.wikidata.org/wiki/__User:Emw
         <https://www.wikidata.org/wiki/User:Emw>

         [1]

https://www.wikidata.org/wiki/__WT:MB#Distinguishing_between___genes_and_pr…

<https://www.wikidata.org/wiki/WT:MB#Distinguishing_between_genes_and_proteins>
         [2] https://www.wikidata.org/wiki/__WT:MB#Human.2Fmouse.2F..._ID
         <https://www.wikidata.org/wiki/WT:MB#Human.2Fmouse.2F..._ID>
         [3] https://www.wikidata.org/wiki/__User:ProteinBoxBot
         <https://www.wikidata.org/wiki/User:ProteinBoxBot>.  Chinmay Nalk
         (https://www.wikidata.org/__wiki/User:Chinmay26
         <https://www.wikidata.org/wiki/User:Chinmay26>) did all the work
         on this,
         with input from WD:MB.
         [4]
         https://www.wikidata.org/wiki/__Wikidata:WikiProject___Mineralogy
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Mineralogy>
         [5]

http://tools.wmflabs.org/__wikidata-todo/tree.html?q=__Q15281399&rp=279…

<http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q15281399&rp=279&lang=en>
         [6]

http://tools.wmflabs.org/__wikidata-todo/tree.html?q=__Q796194&rp=279&a…

<http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q796194&rp=279&lang=en>
         [7] http://apps.who.int/__classifications/icd10/browse/__2010/en
         <http://apps.who.int/classifications/icd10/browse/2010/en>
         [8] https://en.wikipedia.org/wiki/__Template:Surgeries
         <https://en.wikipedia.org/wiki/Template:Surgeries>
         [9]

https://www.wikidata.org/w/__index.php?title=Wikidata:__Database_reports/Po…

<https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Popular_properties&oldid=125595374>
         [10] Examples include
         -
         https://www.wikidata.org/wiki/__Wikidata:Project_chat#__chemical_element
         <https://www.wikidata.org/wiki/Wikidata:Project_chat#chemical_element>
         -

https://www.wikidata.org/wiki/__Wikidata:Project_chat/Archive/__2013/12#Top…

<https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/12#Top_of_the_subclass_tree>

         -

https://www.wikidata.org/wiki/__Wikidata:Project_chat/Archive/__2014/01#Que…

<https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/01#Question_about_classes.2C_and_.27instance_of.27_vs_.27subclass.27>
         [11]

http://tools.wmflabs.org/__wikidata-todo/tree.html?q=__Q164950&rp=279&a…

<http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q164950&rp=279&lang=en>
         [12] http://korrekt.org/page/The___Incredible_ELK
         <http://korrekt.org/page/The_Incredible_ELK>
         [13] https://www.mediawiki.org/__wiki/Wikidata_Toolkit
         <https://www.mediawiki.org/wiki/Wikidata_Toolkit>

         On Mon, May 5, 2014 at 12:46 PM, Markus Kroetzsch
         &lt;markus.kroetzsch(a)tu-dresden.__de
         <mailto:markus.kroetzsch@tu-dresden.de>
         <mailto:markus.kroetzsch@tu-__dresden.de
         <mailto:markus.kroetzsch@tu-dresden.de>>>

         wrote:

              Hi,

              I got interested in subclass of (P279) and instance of (P31)
              statements recently. I was surprised by two things:

              (1) There are quite a lot of subclass of statements: tenth
         of thousands.
              (2) Many of them make a lot of sense, and (in particular)
         are not
              (obvious) copies of Wikipedia categories.

              My big question is: who is creating all these statements
         and how is
              this done? It seems too much data to be created manually, but I
              don't see obvious automated approaches either (and there
         are usually
              no references given).

              I also found some rare issues. "A subclass of B" should be
         read as
              "Every A is also a B". For example, we have "Every piano
         (Q5994) is
              also a keyboard instrument (Q52954)". Overall, the great
         majority of
              cases I looked at had remarkably sane modelling (which
         reinforces my
              big question).

              But there are still cases where "subclass of" is mixed up with
              "instance of". For example, Wikidata also says "Every
'House of
              Staufen' (Q130875) is also a dynasty (Q164950)". This is
         dubious --
              how many instances of 'House of Staufen' are there? I guess we
              really want to say that "The House of Staufen is a(n
         instance of)
              dynasty." Is this a singular error or a systematic issue?
ith
         such issues
              -- or it would be a miracle that things are in such a good
         shape
              already :-) I have read the talk page for subclass of, but
         that does
              not seem to explain the original of all the data we have
         already.
              Pointers?

              Cheers,

              Markus

              ___________________________________________________
              Wikidata-l mailing list
         Wikidata-l(a)lists.wikimedia.org
         <mailto:Wikidata-l@lists.wikimedia.org>
         <mailto:Wikidata-l@lists.__wikimedia.org
         <mailto:Wikidata-l@lists.wikimedia.org>>
         https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
         <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>
              <https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
         <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>

         _________________________________________________
         Wikidata-l mailing list
         Wikidata-l(a)lists.wikimedia.org
         <mailto:Wikidata-l@lists.wikimedia.org>
         https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
         <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

     _________________________________________________
     Wikidata-l mailing list
     Wikidata-l(a)lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
     https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
     <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Subclass of/instance of