Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

20 Oct 2018

Hello,

It is interesting to note that what Cparle wants are "is a" relationships
based on common sense. For most people, ants are insects, not instances of
taxon. A clarinet is a woodwind instrument, and woodwind instruments are
musical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the first
sentence of a Wikipedia page. Whether in English, French, Italian, a woman
is always "a female *human *being."

For "poodle", this would look like (following the links in the English
version of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of the
class of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...

...
 From my point of view, this classification looks much
better than the current relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts
(especially Wikipedia) has been studied for a long time and gives good
results, even with simple methods based on hand-crafted rules. In the case
of Wikipedia, the hypernym often has a page itself (and therefore a link to
Wikidata), which could simplify the NLP extraction and the mapping with
Wikidata items.

Of course, the extracted relationships will not always be "subclass of" or
"instance of". But if someone proposed a new property called "Wikipedia
Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use
it more willingly and with more confidence than the current system. This
would also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an
"Ancient Iranian *religion *founded by Zoroaster", one would expect the
class "religion" to appear much earlier in the hierarchy of superclasses of
this item. If there was this property "Wikipedia Hypernyms", we could
mention it in the same page - since Wikipedia describes Zoroastrianism as
"one of the world's oldest *religions *that remains active." And a SPARQL
query looking for 'all items that have "religion" as "Wikipedia
hypernyms"
property' would be much much faster.

Note: sorry if this reflection is naive or if it has already been
discussed/tested.

Cheers,

Ettore

On Thu, 27 Sep 2018 at 23:35, James Heald &lt;jpm.heald(a)gmail.com&gt; wrote:

...
  This recent announcement by the Structured Data team
perhaps ought to be
 quite a heads-up for us:

https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_C…

 Essentially the team has given up on the hope of using Wikidata
 hierarchies to suggest generalised "depicts" values to store for images
 on Commons, to match against terms in incoming search requests.

 i.e.  if an image is of a German Shepherd dog, and identified as such,
 the team has given up on trying to infer in general from Wikidata that
 'dog' is also a search term that such an image should score positively
 with.

 Apparently the Wikidata hierarchies were simply too complicated, too
 unpredictable, and too arbitrary and inconsistent in their design across
 different subject areas to be readily assimilated (before one even
 starts on the density of bugs and glitches that then undermine them).

 Instead, if that image ought to be considered in a search for 'dog', it
 looks as though an explicit 'depicts:dog' statement may be going to be
 needed to be specifically present, in addition to 'depicts:German
 Shepherd'.

 Some of the background behind this assessment can be read in
     https://phabricator.wikimedia.org/T199119
 in particular the first substantive comment on that ticket, by Cparle on
 10 July, giving his quick initial read of some of the issues using
 Wikidata would face.

 SDC was considered a flagship end-application for Wikidata.  If the data
 in Wikidata is not usable enough to supply the dogfood that project was
 expected to be going to be relying on, that should be a serious wake-up
 call, a red flag we should not ignore.

 If the way data is organised across different subjects is currently too
 inconsistent and confusing to be usable by our own SDC project, are
 there actions we can take to address that?  Are there design principles
 to be chosen that then need to be applied consistently?  Is this
 something the community can do, or is some more active direction going
 to need to be applied?

 Wikidata's 'ontology' has grown haphazardly, with little oversight, like
 an untended bank of weeds.  Is some more active gardening now required?

    -- James.

 ---
 This email has been checked for viruses by AVG.
 https://www.avg.com

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons