Gerard,

Sure working with linked data is great.  But sometimes data is not linked at all and has no identifiers...

That's where the work Antonin is doing with OpenRefine helps with reconciling when even there are no identifiers other than a name.  Many datasets only have Strings as Things.  In fact, I'd say its quite useful to not only add additional statements about existing Things we already have, but also adding more Things in the world that have yet to be included in a database like Wikidata where no identifiers have been created yet for that Thing.

And I'm just as upset as you are about the goldmine of data still locked up in Freebase.  But don't worry, baby steps, and eventually that data will make its way into Wikidata.  Getting the Primary Sources tool up to par is a big step towards that, but certainly not the end of the line.
 
-Thad
+ThadGuidry  

On Tue, Aug 8, 2017 at 6:57 AM Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
Hoi,
Given that Wikidata has identifiers to many external sources the challenge of reconciliation is often less of a challenge for crowds and less of a challenge than it needs to be. A few examples; the OCLC maintains two distinct identifiers; VIAF and ISNI.  They are both actively maintained. When we include VIAF numbers in Wikidata, there will be instances where the identifiers become redirects. The same is true for ISNI. When we have the latest VIAF numbers, the ISNI numbers are highly likely to be correct. (better than 95% - the minimum requirements for imports at ISNI)..

When we share our identifiers regularly, we will learn about redirects and gain the direct links. We shared our identiers and VIAF identifiers with the Open Library. They now include them and in return we received a file that helped us depuplicate our Open Library identifiers and replace the redirects. What is infuriating is that there are Open Library identifiers hidden in the Freebase data. They cannot be exported, we can not send them to OL for processing and import them in Wikidata. We do a subpar job as a consequence.

Another project where we will  gain information from multiple sources is the Biodiversity Heritage Library. We may gain links through their collaboration with the Internet Archive and the OCLC. This will reduce the chances for the introduction of duplicates at our end because of shared identifiers. I will also reduce the amount or people we have to process before they are included in Wikidata. It will allow for both OCLC, BHL and IA to learn of identifiers as we have them allowing for subsequent improvement is quality in the future for all of us.

So in my opinion we should agressively share identifiers, collaborate and seek the redirects and replace them and become more and more a focal point for links between resources.
Thanks,
     GerardM

On 8 August 2017 at 11:13, Marco Fossati <fossati@spaziodati.eu> wrote:
Hi Antonin,

On 8/7/17 20:36, Antonin Delpeuch (lists) wrote:
Does anybody know an alternative to CrowdFlower that can be used for
free with volunteer workers?
There you go: https://crowdcrafting.org/
Hope this helps you keep up with your great work on openrefine.

I believe entity reconciliation is one of the most challenging tasks that keep third-party data providers away from imports to Wikidata.
Cheers,

Marco


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata