Re: [Wikimedia-genealogy] Existing software - Wikimedia-genealogy

15 May 2017

@Robert Shaw:

There are actually quite a large number of data interchange standards in
addition to GEDCOM, which has been effectively abandoned at version 5.5
since 1996[1]. (FamilySearch's GEDCOM X[2] may fairly be assumed to be
the LDS's current model, retaining the nuclear family/individual
approach.) GeneWeb's GW format[3] has some support as an interchange,
for example Gramps[4] can both import and export .gw. Gramps itself has
a more-widely supported Gramps XML[5]. There are an almost uncountable
number of GEDCOM extensions, and similarly there is a continuum of
personal data standards which are suitable or extensible for purpose,
e.g. vCard[6] and iCalendar[7].

Most standards commonly used for genealogy focus on nuclear family
units. Another approach is relationships, and a third is event-driven,
though all standards involve at least some support for all three. Here
are a couple of questions for genealogy data, though:
* Locations change name (and nationality) over time - should a genealogy
standard take this into account?
* Parentage is changing as we learn more about genetics; a child was
born recently with genetic material from three individuals.[8] How
should this be represented?
* In a similar vein, between 1 and 2 percent of humans are neither male
nor female by traditional measures[9]. (That is roughly 3x the number of
people who have AB- blood type, to give you a comparison. It is also
more than the number of genetically redheaded people.) How should this
be represented?
* Legal issues are also complexifying: there are now many children in
the USA, Canada, and elsewhere who list multiple parents on their birth
certificates[10], and sometimes non-traditional rôles such as birth
mother or donor father. Again, how should these be recorded?

...
  (What's the data product of Wikipedia, for
instance?) Well, taking Wikipedia as an example, there are four primary data
products:
* The human/machine readable html interface
* The two machine-oriented api interfaces
* The various formats of database dumps
* The definable collection .pdf output.

In addition there are a range of mostly statistical products which are
more meta than content. But mostly it is a library of hyperlinked
articles about everything.

Find-a-grave is a library of databases of headstones, with optional
hyperlinking and text articles and photographs. Ancestry.com is indexes
to repositories of documents and GEDCOMs. And so on. Genealogy is a
multi-disciplinary effort, and I wonder what areas of it we would be
interested in creating and curating: crowd-sourcing transcriptions of
Census documents may be one idea, but so is developing a public gene
heritage database hosted on Wikidata with DIY mail-in swab kits.

@Sam Wilson: Actually, your arguments about the UNIX way are one of my
arguments against Mediawiki for genealogy - MW handles text reasonably
well, but not structured data. For that matter, instant commons is an
accretion to allow something which MW does not do well - share objects
across instances.

The primary benefit of instant commons is offloading the cost of
curating the media files, but in exchange for giving a different project
(with different goals and missions) control over what media files you
may have. This may work well enough for a genealogy project, but it also
shows a point at which a genealogy project cannot 'do one thing and do
it well.' The concept of involving Wikisource for transcriptions,
Commons for source documents, Wikidata for many elements, and then a
separate MW instance for the presentation layer sounds great, but I
expect it would be pretty brittle and fragile in practice.

@Michael Smith: there are reasons to have hidden/private individuals:
living persons and the recently deceased being two of the most obvious.
That of course does not defend (nor condemn) private family trees. One
thing done by Ancestry.com is to use private trees as a source of
'hints', and to improve the accuracy of hints for both public and
private trees - e.g. if a private individual is linked to a given social
security number, it is of greater likelihood that an individual in a
public tree with multiple close correlations has the same social
security number.

Personally I would like to build a series of api which can be knit
together - one for people data, one for graves/cemeteries, one for media
assets, one for places, one for events... and likely others. These could
be MW extension, or maybe Slim[10], but the point being to allow maximum
flexibility for data CRUD, and allow many options for presentation -
including but not limited to Mediawiki.

[1] https://familysearch.org/developers/docs/gedcom/gedcom55.pdf
[2] https://github.com/FamilySearch/gedcomx
[3] https://geneweb.tuxfamily.org/wiki/GWformat
[4] https://github.com/gramps-project/gramps
[5] https://gramps-project.org/xml/
[6] https://tools.ietf.org/html/rfc6350
[7] https://tools.ietf.org/html/rfc5545
[8]
https://www.nytimes.com/2016/09/28/health/birth-of-3-parent-baby-a-success-…
[9] http://onlinelibrary.wiley.com/doi/10.1111/nin.12184/full
[10]
http://www.huffingtonpost.com/2014/02/11/baby-with-3-parents-birth-certific…
[11] https://www.slimframework.com/ (https://github.com/slimphp/Slim)