On Nov 21, 2015, at 9:44 AM, Gerard Meijssen
<gerard.meijssen(a)gmail.com> wrote:
Hoi,
Yes you can add an item for the missing brother. When you do, you should link it to his
brother and thereby they are explicitly not the same. They can both have the same alias.
It helps when you add pertinent data like a date of birth/death. I take it they are not
twins.
Thanks,
GerardM
Hi Gerard, I am actually interested in the general problem, not this specific pair. In
other words: should Mix’n’match automatically perform the two actions I listed above? In
other words, how can we clearly signal *in Wikidata* that the output of costly human labor
should not be undone by machines or lazy humans in the future?
On 21 November 2015 at 18:34, Dario Taraborelli
<dtaraborelli(a)wikimedia.org <mailto:dtaraborelli@wikimedia.org>> wrote:
I finally found the time to play extensively with Mix’n’match and it’s by far one of the
most promising models I’ve come across for Wikidata growth. A short conversation with
Magnus on Twitter got me thinking on how to best preserve the output of costly human
curation.[1]
I spent most of my time manually auditing automatically matched entries from the
Dizionario Biografico degli Italiani [2]. These entries are long, unstructured
biographical entries and it takes quite a lot of effort to understand if the two
individuals referenced by Wikidata and DBI actually are the same person. This is a great
example of a task that’s still pretty hard for a machine to perform, no matter how
sophisticated the algorithm.
My favorite example? Mix’n’ match suggested a match between Giulio Baldigara (Q1010811
<https://www.wikidata.org/wiki/Q1010811>) and Giulio Baldigara (DBI
<http://www.treccani.it/enciclopedia/giulio-baldigara_(Dizionario_Biografico)/>)
which looked totally legitimate: these two individuals are both Italian architects from
the 16th century with the same name, they were both born around the same years in the same
city, they were both active in Hungary at the same time: strong indication that they are
the same person, right? It turns out they are brothers and the full name of the person
referenced in Wikidata is Giulio Cesare Baldigara (the least known in a family of
architects). I unmatched the suggestion and flagged the DBI entry as non existing in
Wikidata.
My question at the moment is: the output of a labor-intensive review of a potential match
is currently stored as a volatile flag in a tool hosted on labs, but is invisible in
Wikidata. Should something happen to Mix’n’match (god forbid) the result of my work would
get lost. Which got me thinking:
- shouldn’t a manually unmatched item be created directly on Wikidata (after all DBI is
all about notable individuals who would easily pass Wikidata’s notability threshold for
biographies)
- shouldn’t the relation between Giulio (Cesare) Baldigara (Q1010811
<https://www.wikidata.org/wiki/Q1010811>) and the newly created item for Giulio
Baldigara be explicitly represented via a not the same as property, to prevent future
humans or machines from accidentally remerging the two items based on some kind of
heuristics
Thoughts welcome,
Dario
[1]
https://twitter.com/ReaderMeter/status/667214565621432320
<https://twitter.com/ReaderMeter/status/667214565621432320>
[2]
https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offs…
<https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
Dario Taraborelli Head of Research, Wikimedia Foundation
wikimediafoundation.org <http://wikimediafoundation.org/> •
nitens.org
<http://nitens.org/> • @readermeter <http://twitter.com/readermeter>