Hoi,
What you are talking about is a workflow that is much more involved than
anything we currently automate.
The first thing is that you define a project. You assume that everyone
mentioned in the Dizionario Biografico degli Italiani is notable enough to
have a Wikidata item. The second thing you do is mix and match the people
in this book against the people in Wikidata. For the ones you cannot match
you want to create new items. The third thing you do is add statements both
for all the people that do not exist in Wikidata. As a consequence you will
add the link between the two brothers. You will make sure that both are
known as architects.
The creation of new items is done on the basis of those people in the book
that do not have a Wikidata item yet. You may find after some time that you
missed people that did have a Wikidata item after all, they are then
merged. Ideally there is a tool that allows easy addition of sources to
statements that can be sourced to the book.
In general, much of this can be done already. Much of this will need to be
done by hand. Much of this needs more documentation if it is to be a tool
that can be done by more than just a few.
Thanks,
GerardM
On 21 November 2015 at 18:50, Dario Taraborelli <dtaraborelli(a)wikimedia.org>
wrote:
On Nov 21, 2015, at 9:44 AM, Gerard Meijssen <gerard.meijssen(a)gmail.com>
wrote:
Hoi,
Yes you can add an item for the missing brother. When you do, you should
link it to his brother and thereby they are explicitly not the same. They
can both have the same alias. It helps when you add pertinent data like a
date of birth/death. I take it they are not twins.
Thanks,
GerardM
Hi Gerard, I am actually interested in the general problem, not this
specific pair. In other words: should Mix’n’match automatically perform the
two actions I listed above? In other words, how can we clearly signal *in
Wikidata* that the output of costly human labor should not be undone by
machines or lazy humans in the future?
On 21 November 2015 at 18:34, Dario Taraborelli <
dtaraborelli(a)wikimedia.org> wrote:
I finally found the time to play extensively with
Mix’n’match and it’s by
far one of the most promising models I’ve come across for Wikidata growth.
A short conversation with Magnus on Twitter got me thinking on how to best
preserve the output of costly human curation.[1]
I spent most of my time manually auditing automatically matched entries
from the Dizionario Biografico degli Italiani [2]. These entries are long,
unstructured biographical entries and it takes quite a lot of effort to
understand if the two individuals referenced by Wikidata and DBI actually
are the same person. This is a great example of a task that’s still pretty
hard for a machine to perform, no matter how sophisticated the algorithm.
My favorite example? Mix’n’ match suggested a match between *Giulio
Baldigara *(Q1010811 <https://www.wikidata.org/wiki/Q1010811>) and *Giulio
Baldigara* (DBI
<http://www.treccani.it/enciclopedia/giulio-baldigara_(Dizionario_Biografico)/>)
which looked totally legitimate: these two individuals are both Italian
architects from the 16th century with the same name, they were both born
around the same years in the same city, they were both active in Hungary at
the same time: strong indication that they are the same person, right? It
turns out they are brothers and the full name of the person referenced in
Wikidata is *Giulio Cesare Baldigara* (the least known in a family of
architects). I unmatched the suggestion and flagged the DBI entry as non
existing in Wikidata.
My question at the moment is: the output of a labor-intensive review of a
potential match is currently stored as a volatile flag in a tool hosted on
labs, but is invisible in Wikidata. Should something happen to Mix’n’match
(god forbid) the result of my work would get lost. Which got me thinking:
- shouldn’t a manually unmatched item be created directly on Wikidata
(after all DBI is all about notable individuals who would easily pass
Wikidata’s notability threshold for biographies)
- shouldn’t the relation between *Giulio (Cesare) Baldigara *(Q1010811
<https://www.wikidata.org/wiki/Q1010811>) and the newly created item for *Giulio
Baldigara* be explicitly represented via a *not the same as* property,
to prevent future humans or machines from accidentally remerging the two
items based on some kind of heuristics
Thoughts welcome,
Dario
[1]
https://twitter.com/ReaderMeter/status/667214565621432320
[2]
https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offs…
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
*Dario Taraborelli *Head of Research, Wikimedia Foundation
wikimediafoundation.org •
nitens.org • @readermeter
<http://twitter.com/readermeter>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata