Re: [Wikidata] Properties for family relationships in Wikidata

19 Aug 2015

      Hi all,
There have been some discussions here already on what to do with the 
inferences (add them to Wikidata, just display them, add them only to 
the query service, etc.). That's great, but this is already the second 
step from where we are now.
Right now, we don't have any way yet for people to write down what 
should be inferred. If we could describe this, we could easily add 
information on what to do with the inference (add, display, make 
queryable, use for quality control, etc.). This could be discussed on a 
case-by-case basis (similar to bot requests).
Even the (very simple) case of symmetry shows that we are not there yet: 
we have no information anywhere on Wikidata that tells us that start and 
end qualifiers for spouse should also be symmetric. It is not 
automatically the case that all qualifiers of a symmetric property are 
symmetric! For example, "diplomatic relation" (P530) is symmetric and 
uses qualifier "diplomatic mission sent" (P531) that points to the 
embassy of the subject country in the value country. Clearly this 
qualifier should not be copied when inferring symmetric statements. 
Symmetry is only the simplest case; already "inverse of" requires more 
information ...
We therefore first need to come up with a good way of describing the 
intended inferences in the wiki. Then we can think about how to best act 
on this information, step by step. The current constraints such as "this 
property is symmetric" are obviously too limited for really describing 
what should be inferred. On the other hand, one needs to take care that 
descriptions are not too general, to make sure that they can still be 
implemented and that they remain meaningful when considering many of 
them together (just consider what happens when an inferred relation 
triggers another inference ...). Luckily, there is a lot of experience 
in this area today, so it's not rocket science to come up with a 
workable description language that is not a collection of special cases 
and still is not too general or too complicated.
So what's the best way to move forward? I have some ideas on how to do 
this, but I would like to also have user feedback to make sure that the 
result is easy to use and covers many important use cases. The basic 
idea would be to come up with a template-based format for describing 
rules of inference of the form "If there is a statement that looks like 
X, then infer a statement that looks like Y". However, there must also 
be a way to say how the qualifiers should be formed for Y. I have some 
ideas on how to do this in a (hopefully) sane way.
If other people are interested, we could form some kind of interest 
group to work this out together. Alternatively, I can start by making a 
proposal on the wiki.
Markus
On 17.08.2015 14:47, Markus Kroetzsch wrote:
...
Hi Andrew,
I am very interested in this, especially in the second aspect (how to
handle symmetry). There are many cases where we have two or more ways to
say the same thing on Wikidata (symmetric properties are only one case).
It would be useful to draw these inferences so that they can used for
queries and maybe also in the UI.
This can also help to solve some of the other problems you mention: for
those who would like to have properties "son" and "daughter", one could
infer their values automatically from other statements, without editors
having to maintain this data at all.
A possible way to maintain these statements on wiki would be to use a
special reference to encode that they have been inferred (and from
what). This would make it possible to maintain them automatically
without the problem of human editors ending up wrestling with bots ;-)
Moreover, it would not require any change in the software on which
Wikidata is running.
For the cases you mentioned, I don't think that there is a problem with
too many inferred statements. There are surely cases where it would not
be practical (in the current system) to store inferred data, but family
relationships are usually not problematic. In fact, they are very useful
to human readers.
Of course, the community needs to fully control what is inferred, and
this has to be done in-wiki. We already have symmetry information in
constraints, but for useful inference we might have to be stricter. The
current constraints also cover some not-so-strict cases where exceptions
are likely (e.g., most people have only one gender, but this is not a
strong rule; on the other hand, one is always the child of one's mother
by definition).
One also has to be careful with qualifiers etc. For example, the start
end end of a "spouse" statement should be copied to its symmetric
version, but there might also be qualifiers that should not be copied
like this. I would like to work on a proposal for how to specify such
things. It would be good to coordinate there.
A first step (even before adding any statement to Wikidata) could be to
add inferred information to the query services and RDF exports. This
will make it easier to solve part of the problem first without having
too many discussions in parallel.
Best regards,
Markus
On 17.08.2015 13:29, Andrew Gray wrote:
...
Hi all,
I've recently been thinking about how we handle family/genealogical
relationships in Wikidata - this is, potentially, a really valuable
source of information for researchers to have available in a
structured form, especially now we're bringing together so many
biographical databases.
We currently have the following properties to link people together:

spouses (P26) and cohabitants (P451) - not gendered
parents (P22/P25) and step-parents (P43/P44) - gendered
siblings (P7/P9) - gendered
children (P40) - not gendered (and oddly no step-children?)
a generic "related to" (P1038) for more distant relationships

There's two big things that jump out here.
** First, gender. Parents are split by gender while children are not
(we have mother/father not son/daughter). Siblings are likewise
gendered, and spouses are not. These are all very early properties -
does anyone remember how we got this way?
This makes for some odd results. For example, if we want to using our
data to identify all the male-line *descendants* of a person, we have
to do some complicated inference from [P40 + target is male]. However,
to identify all the male-line *ancestors*, we can just run back up the
P22 chain. It feels quite strange to have this difference, and I
wonder if we should standardise one way or the other - split P40 or
merge the others.
In some ways, merging seems more elegant. We do have fairly good
gender metadata (and getting better all the time!), so we can still do
gender-specific relationship searches where needed. It also avoids
having to force a binary gender approach - we are in the odd position
of being able to give a nuanced entry in P21 but can only say if
someone is a "sister" or "brother".
** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
by definition symmetric. If A has P26:B, then B should also have
P26:A. The gendered cases are a little more complicated, as if A has
P40:B, then B has P22:A or P25:A, but there is still a degree of
symmetry - one of those must be true.
However, Wikidata doesn't really help us make use of this symmetry. If
I list A as spouse of B, I need to add (separately) that B is spouse
of A. If they have four children C, D, E, and F, this gets very
complicated - we have six articles with *30* links between them, all
of which need to be made manually. It feels like automatically making
symmetric links for these properties would save a lot of work, and
produce a much more reliable dataset.
I believe we decided early on not to do symmetric links because it
would swamp commonly linked articles (imagine what Q5 would look like
by now!). On the other hand, these are properties with a very narrowly
defined scope, and we actively *want* them to be comprehensively
symmetric - every parent article should list all their children on
Wikidata, and every child article should list their parent and all
their siblings.
Perhaps it's worth reconsidering whether to allow symmetry for a
specifically defined class of properties - would an automatically
symmetric P26 really swamp the system? It would be great if the system
could match up relationships and fill in missing parent/child,
sibling, and spouse links. I can't be the only one who regularly adds
one half of the relationship and forgets to include the other!
A bot looking at all of these and filling in the gaps might be a
useful approach... but it would break down if someone tries to remove
one of the symmetric entries without also removing the other, as the
bot would probably (eventually) fill it back in. Ultimately, an
automatic symmetry would seem best.
Thoughts on either of these? If there is interest I will write up a
formal proposal on-wiki.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Properties for family relationships in Wikidata