It is interesting to note that what Cparle wants are "is a" relationships
based on common sense. For most people, ants are insects, not instances of
taxon. A clarinet is a woodwind instrument, and woodwind instruments are
musical instruments, not an instance of "first order metaclass".
One of the best sources of "common sense" hypernymy is probably the first
sentence of a Wikipedia page. Whether in English, French, Italian, a woman
is always "a female *human *being."
For "poodle", this would look like (following the links in the English
version of Wikipedia):
- The poodle is a group of formal *dog breeds*
- Dog breeds are *dogs* that...
- The domestic dog (...) is a member of the genus *Canis* (canines)
- Canis is a genus of the *Canidae*
- The biological family Canidae (...) is a lineage of *carnivorans*
- Carnivora (...) is a diverse *scrotiferan *order
- Scrotifera is a clade of *placental mammals*
- Placentalia ("Placentals") is one of the three extant subdivisions of the
class of animals *Mammalia*...
- Mammals are the *vertebrates *within the class Mammalia...
From my point of view, this classification looks much
better than the
current relationships in Wikidata's ontology.
The automatic extraction of hypernymic relationships from English texts
(especially Wikipedia) has been studied for a long time and gives good
results, even with simple methods based on hand-crafted rules. In the case
of Wikipedia, the hypernym often has a page itself (and therefore a link to
Wikidata), which could simplify the NLP extraction and the mapping with
Of course, the extracted relationships will not always be "subclass of" or
"instance of". But if someone proposed a new property called "Wikipedia
Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use
it more willingly and with more confidence than the current system. This
would also better respect the logic of Wikidata's descriptions.
I mean, if the description of Zoroastrianism (Q9601) says this is an
"Ancient Iranian *religion *founded by Zoroaster", one would expect the
class "religion" to appear much earlier in the hierarchy of superclasses of
this item. If there was this property "Wikipedia Hypernyms", we could
mention it in the same page - since Wikipedia describes Zoroastrianism as
"one of the world's oldest *religions *that remains active." And a SPARQL
query looking for 'all items that have "religion" as "Wikipedia
property' would be much much faster.
Note: sorry if this reflection is naive or if it has already been
On Thu, 27 Sep 2018 at 23:35, James Heald <jpm.heald(a)gmail.com> wrote:
This recent announcement by the Structured Data team
perhaps ought to be
quite a heads-up for us:
Essentially the team has given up on the hope of using Wikidata
hierarchies to suggest generalised "depicts" values to store for images
on Commons, to match against terms in incoming search requests.
i.e. if an image is of a German Shepherd dog, and identified as such,
the team has given up on trying to infer in general from Wikidata that
'dog' is also a search term that such an image should score positively
Apparently the Wikidata hierarchies were simply too complicated, too
unpredictable, and too arbitrary and inconsistent in their design across
different subject areas to be readily assimilated (before one even
starts on the density of bugs and glitches that then undermine them).
Instead, if that image ought to be considered in a search for 'dog', it
looks as though an explicit 'depicts:dog' statement may be going to be
needed to be specifically present, in addition to 'depicts:German
Some of the background behind this assessment can be read in
in particular the first substantive comment on that ticket, by Cparle on
10 July, giving his quick initial read of some of the issues using
Wikidata would face.
SDC was considered a flagship end-application for Wikidata. If the data
in Wikidata is not usable enough to supply the dogfood that project was
expected to be going to be relying on, that should be a serious wake-up
call, a red flag we should not ignore.
If the way data is organised across different subjects is currently too
inconsistent and confusing to be usable by our own SDC project, are
there actions we can take to address that? Are there design principles
to be chosen that then need to be applied consistently? Is this
something the community can do, or is some more active direction going
to need to be applied?
Wikidata's 'ontology' has grown haphazardly, with little oversight, like
an untended bank of weeds. Is some more active gardening now required?
This email has been checked for viruses by AVG.
Wikidata mailing list