Hoi,

What you are talking about is a workflow that is much more involved than anything we currently automate.

The first thing is that you define a project. You assume that everyone mentioned in the Dizionario Biografico degli Italiani is notable enough to have a Wikidata item. The second thing you do is mix and match the people in this book against the people in Wikidata. For the ones you cannot match you want to create new items. The third thing you do is add statements both for all the people that do not exist in Wikidata. As a consequence you will add the link between the two brothers. You will make sure that both are known as architects.

The creation of new items is done on the basis of those people in the book that do not have a Wikidata item yet. You may find after some time that you missed people that did have a Wikidata item after all, they are then merged. Ideally there is a tool that allows easy addition of sources to statements that can be sourced to the book.

In general, much of this can be done already. Much of this will need to be done by hand. Much of this needs more documentation if it is to be a tool that can be done by more than just a few.

Thanks,

GerardM

On 21 November 2015 at 18:50, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

On Nov 21, 2015, at 9:44 AM, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:

Hoi,
Yes you can add an item for the missing brother. When you do, you should link it to his brother and thereby they are explicitly not the same. They can both have the same alias. It helps when you add pertinent data like a date of birth/death. I take it they are not twins.
Thanks,
GerardM

Hi Gerard, I am actually interested in the general problem, not this specific pair. In other words: should Mix’n’match automatically perform the two actions I listed above? In other words, how can we clearly signal *in Wikidata* that the output of costly human labor should not be undone by machines or lazy humans in the future?

On 21 November 2015 at 18:34, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:
I finally found the time to play extensively with Mix’n’match and it’s by far one of the most promising models I’ve come across for Wikidata growth. A short conversation with Magnus on Twitter got me thinking on how to best preserve the output of costly human curation.[1]

I spent most of my time manually auditing automatically matched entries from the Dizionario Biografico degli Italiani [2]. These entries are long, unstructured biographical entries and it takes quite a lot of effort to understand if the two individuals referenced by Wikidata and DBI actually are the same person. This is a great example of a task that’s still pretty hard for a machine to perform, no matter how sophisticated the algorithm.

My favorite example? Mix’n’ match suggested a match between Giulio Baldigara (Q1010811) and Giulio Baldigara (DBI) which looked totally legitimate: these two individuals are both Italian architects from the 16th century with the same name, they were both born around the same years in the same city, they were both active in Hungary at the same time: strong indication that they are the same person, right? It turns out they are brothers and the full name of the person referenced in Wikidata is Giulio Cesare Baldigara (the least known in a family of architects). I unmatched the suggestion and flagged the DBI entry as non existing in Wikidata.

My question at the moment is: the output of a labor-intensive review of a potential match is currently stored as a volatile flag in a tool hosted on labs, but is invisible in Wikidata. Should something happen to Mix’n’match (god forbid) the result of my work would get lost. Which got me thinking:

- shouldn’t a manually unmatched item be created directly on Wikidata (after all DBI is all about notable individuals who would easily pass Wikidata’s notability threshold for biographies)
- shouldn’t the relation between Giulio (Cesare) Baldigara (Q1010811) and the newly created item for Giulio Baldigara be explicitly represented via a not the same as property, to prevent future humans or machines from accidentally remerging the two items based on some kind of heuristics

Thoughts welcome,

Dario

[1] https://twitter.com/ReaderMeter/status/667214565621432320
[2] https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&show_noq=0&show_autoq=1&show_userq=0&show_na=0

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Dario Taraborelli Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata