Hi all,
Thanks again for your comments. It looks like:
a) there's interest in simplifying this;
b) creating automatic inferences is possibly desirable but will need a
lot of work and thought.
I'll put together an RFC onwiki about merging the "gendered"
relationship properties, which will address the first part of the
issue, and we can continue to think about how best to approach the
second.
Andrew.
On 17 August 2015 at 12:29, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:
> Hi all,
>
> I've recently been thinking about how we handle family/genealogical
> relationships in Wikidata - this is, potentially, a really valuable
> source of information for researchers to have available in a
> structured form, especially now we're bringing together so many
> biographical databases.
>
> We currently have the following properties to link people together:
>
> * spouses (P26) and cohabitants (P451) - not gendered
> * parents (P22/P25) and step-parents (P43/P44) - gendered
> * siblings (P7/P9) - gendered
> * children (P40) - not gendered (and oddly no step-children?)
> * a generic "related to" (P1038) for more distant relationships
>
> There's two big things that jump out here.
>
> ** First, gender. Parents are split by gender while children are not
> (we have mother/father not son/daughter). Siblings are likewise
> gendered, and spouses are not. These are all very early properties -
> does anyone remember how we got this way?
>
> This makes for some odd results. For example, if we want to using our
> data to identify all the male-line *descendants* of a person, we have
> to do some complicated inference from [P40 + target is male]. However,
> to identify all the male-line *ancestors*, we can just run back up the
> P22 chain. It feels quite strange to have this difference, and I
> wonder if we should standardise one way or the other - split P40 or
> merge the others.
>
> In some ways, merging seems more elegant. We do have fairly good
> gender metadata (and getting better all the time!), so we can still do
> gender-specific relationship searches where needed. It also avoids
> having to force a binary gender approach - we are in the odd position
> of being able to give a nuanced entry in P21 but can only say if
> someone is a "sister" or "brother".
>
> ** Secondly, symmetry. Siblings, spouses, and parent-child pairs are
> by definition symmetric. If A has P26:B, then B should also have
> P26:A. The gendered cases are a little more complicated, as if A has
> P40:B, then B has P22:A or P25:A, but there is still a degree of
> symmetry - one of those must be true.
>
> However, Wikidata doesn't really help us make use of this symmetry. If
> I list A as spouse of B, I need to add (separately) that B is spouse
> of A. If they have four children C, D, E, and F, this gets very
> complicated - we have six articles with *30* links between them, all
> of which need to be made manually. It feels like automatically making
> symmetric links for these properties would save a lot of work, and
> produce a much more reliable dataset.
>
> I believe we decided early on not to do symmetric links because it
> would swamp commonly linked articles (imagine what Q5 would look like
> by now!). On the other hand, these are properties with a very narrowly
> defined scope, and we actively *want* them to be comprehensively
> symmetric - every parent article should list all their children on
> Wikidata, and every child article should list their parent and all
> their siblings.
>
> Perhaps it's worth reconsidering whether to allow symmetry for a
> specifically defined class of properties - would an automatically
> symmetric P26 really swamp the system? It would be great if the system
> could match up relationships and fill in missing parent/child,
> sibling, and spouse links. I can't be the only one who regularly adds
> one half of the relationship and forgets to include the other!
>
> A bot looking at all of these and filling in the gaps might be a
> useful approach... but it would break down if someone tries to remove
> one of the symmetric entries without also removing the other, as the
> bot would probably (eventually) fill it back in. Ultimately, an
> automatic symmetry would seem best.
>
> Thoughts on either of these? If there is interest I will write up a
> formal proposal on-wiki.
>
> --
> - Andrew Gray
> andrew.gray(a)dunelm.org.uk
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org