Gerard,
Sure working with linked data is great. But sometimes data is not linked
at all and has no identifiers...
That's where the work Antonin is doing with OpenRefine helps with
reconciling when even there are no identifiers other than a name. Many
datasets only have Strings as Things. In fact, I'd say its quite useful to
not only *add additional statements about existing Things* we already have,
but *also adding more Things* in the world that have yet to be included in
a database like Wikidata where no identifiers have been created yet for
that Thing.
And I'm just as upset as you are about the goldmine of data still locked up
in Freebase. But don't worry, baby steps, and eventually that data will
make its way into Wikidata. Getting the Primary Sources tool up to par is
a big step towards that, but certainly not the end of the line.
-Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
On Tue, Aug 8, 2017 at 6:57 AM Gerard Meijssen <gerard.meijssen(a)gmail.com>
wrote:
Hoi,
Given that Wikidata has identifiers to many external sources the challenge
of reconciliation is often less of a challenge for crowds and less of a
challenge than it needs to be. A few examples; the OCLC maintains two
distinct identifiers; VIAF and ISNI. They are both actively maintained.
When we include VIAF numbers in Wikidata, there will be instances where the
identifiers become redirects. The same is true for ISNI. When we have the
latest VIAF numbers, the ISNI numbers are highly likely to be correct.
(better than 95% - the minimum requirements for imports at ISNI)..
When we share our identifiers regularly, we will learn about redirects and
gain the direct links. We shared our identiers and VIAF identifiers with
the Open Library. They now include them and in return we received a file
that helped us depuplicate our Open Library identifiers and replace the
redirects. What is infuriating is that there are Open Library identifiers
hidden in the Freebase data. They cannot be exported, we can not send them
to OL for processing and import them in Wikidata. We do a subpar job as a
consequence.
Another project where we will gain information from multiple sources is
the Biodiversity Heritage Library. We may gain links through their
collaboration with the Internet Archive and the OCLC. This will reduce the
chances for the introduction of duplicates at our end because of shared
identifiers. I will also reduce the amount or people we have to process
before they are included in Wikidata. It will allow for both OCLC, BHL and
IA to learn of identifiers as we have them allowing for subsequent
improvement is quality in the future for all of us.
So in my opinion we should agressively share identifiers, collaborate and
seek the redirects and replace them and become more and more a focal point
for links between resources.
Thanks,
GerardM
On 8 August 2017 at 11:13, Marco Fossati <fossati(a)spaziodati.eu> wrote:
Hi Antonin,
On 8/7/17 20:36, Antonin Delpeuch (lists) wrote:
Does anybody know an alternative to CrowdFlower
that can be used for
free with volunteer workers?
There you go:
https://crowdcrafting.org/
Hope this helps you keep up with your great work on openrefine.
I believe entity reconciliation is one of the most challenging tasks that
keep third-party data providers away from imports to Wikidata.
Cheers,
Marco
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata