Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

19 Oct 2018


      Hi Stas,
Thanks for elaborating. I think we could always start with traversing 
only "subclass of". In spite of its limits, it does work in many areas 
(e.g. buildings, astronomical objects, vehicles, organisations, etc.), 
even if by far not in all. Where it doesn't work, one would simply not 
get enough results, but the alternative (do not even use "subclass of") 
will just make this problem worse. Any approach of fixing the latter 
will also help the former.
Now regarding issues such as dog, woman, and many other things, it seems 
clear that what one would need are inference rules. It should be 
possible to say somewhere that a "if a human is female, then it is also 
woman" without having to add the unwanted statement "instance of woman" 
everywhere. Or "if someone has profession 'programmer' then he/she/they 
is/are a programmer" -- at least for the purpose of media search. The 
case of dogs would be complicated (referring to quantifiers) but still 
doable.
Obvious questions arise:
* Would we prefer to maintain such rules somewhere rather than adding 
the relations they might infer manually? (Probably yes, since one would 
need much fewer rules than manual statements, which would always add 
redundancy and cause conflicts -- cf. taxonomy modelling discussion -- 
that are not necessary when applications can select which inference 
rules to use without touching the underlying data.)
* How would the rules look to human editors? (We have made some first 
proposals for this; see the rules supported by SQID [1]; but one can 
come up with other options)
* Where would such rules be managed? (Preferably on Wikidata, but the 
encoding in statements would be a challenge; another challenge is how to 
associate rules with entities -- usually they make connections between 
several entities)
* How would the rules be applied on the live data, especially if there 
are many updates? (Doable using known algorithms and based on existing 
tools, but still needs some implementation work; I think for a start one 
could just reduce the update speed on these "inferred tags" and still 
get a big improvement over the case where nothing of this type is done 
at all).
So would this be a mid-term goal to overcome this issue? I would think 
so, also because there are enough degrees of freedom here to gradually 
grow this from simple (only allow rules that effectively add some more 
traversal hints) to powerful (have rules that can use qualifiers, as 
needed to get from dog to mammal). The main challenge is to find a good 
approach for community-editing this part without restricting upfront to 
a few special cases (as for the case of the constraints).
Inference rules come up as potential solutions in many similar tasks 
where you want users to access/query the data. Imagine someone would 
look for the brothers of a person (let's assume we'd built an 
intelligent search for such things) -- again, Wikidata has no concept of 
"brother" and we would not have any idea how to answer this, unless 
somewhere we'd have a rule that defines how you can find 
brother-relationships from the data that we actually have. This happens 
a lot when you want users who are not familiar with how we organise data 
find things, but the solution cannot be to add every possible 
view/inferred statement to Wikidata explicitly.
Obviously, the rule approach is not something we could deploy anytime 
soon, but it could be something to work towards ...
Cheers,
Markus
[1] Example rule with explanation of how it was applied to find a 
grandfather of Ada Lovelace: https://tinyurl.com/y7rgmk7o
The qualifier sets (X, Y, Z) are unused here and could be hidden 
entirely, but this is just a prototype.
On 20/10/2018 00:28, Stas Malyshev wrote:
...
Hi!
...
possibility to find more results by letting the search engine traverse
the "more-general-than" links stored in Wikidata. People have discovered
cases where some of these links are not correct (surprise! it's a wiki
;-), and the suggestion was that such glitches would be fixed with
higher priority if there would be an application relying on it. But even
The main problem I see here is not that some links are incorrect - which
may have bad effects, but it's not the most important issue. The most
important one, IMHO, that there's no way to figure out in any scalable
and scriptable way what "more-general-than" means for any particular case.
It's different for each type of objects and often inconsistent within
the same class (e.g. see confusion between whether "dog" is an animal, a
name of the animal, name of the taxon, etc.) It's not that navigating
the hierarchy would lead as astray - we're not even there yet to have
this problem, because we don't even have a good way to navigate it.
Using instance-of/subclass-of only seems to not be that useful, because
a lot of interesting things are not represented in this way - e.g.
finding out that Donna Strickland (Q56855591) is a woman (Q467) is
impossible using only this hierarchy. We could special-case a bunch of
those but given how diverse Wikidata is, I don't think this will ever
cover any significant part of the hierarchy unless we find a non-ad-hoc
method of doing this.
This also makes it particularly hard to do something like "let's start
using it and fix the issues as we discover them", because the main issue
here is that we don't have a way to start with anything useful beyond a
tiny subset of classes that we can special-case manually. We can't
launch a rocket and figure how to build the engine later - having a
working engine is a prerequisite to launching the rocket!
There are also significant technical challenges in this - indexing
dynamically changing hierarchy is very problematic, and with our
approach to ontology anything can be a class, so we'd have to constantly
update the hierarchy. But this is more of a technical challenge, which
will come after we have some solution for the above.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons